Subject Re: Setting VARCHAR length in a FB UDF
Author Fred Polizo
--- In firebird-support@yahoogroups.com, "Ivan Prenosil"
<Ivan.Prenosil@s...> wrote:
> Fred,
>
> Any progress ? Have you already succeeded with returning VARCHARS
from UDF ?
>
> Ivan

Thanks for asking. Yes, I "think" I found the problem. It appears
that FB doesn't always handle binary strings (OCTETS) correctly for
VARCHARs (and maybe CHARs, in some cases). Of course, I've never used
IB and FB before this; so, I could be profoundly wrong! I'll try to
briefly explain how I reached my conclusion and perhaps someone will
kindly let me know if I'm off base? :)

I fetched the source for FB RC4 via CVS and had a look. Various
sections of code make it clear that, yes, VARCHAR length is supposed
to be a simple USHORT with no special encoding. [Note: the dubious
advice to store an ASCII space char into the MSB of the length is
wrong, but I now know why it appears to help (explained below).]
Thus, my UDF functions handle VARCHARs correctly and should work, yet
do not! So, I built a debug (linux) version of FB and ran ISQL in the
debugger to see what happens after my UDF function returns a VARCHAR
parameter. I found a number of "problems" in the FB code.

One problem is in the FUN_evaluate() function, in src/jrd/fun.epp, at
line 544. This statement calls strlen() on my returned binary VARCHAR
data! Strlen() reads the MSB of the length field of the VARCHAR,
typically zero, stops and returns a length of zero. This is why my
valid UDF binary VARCHARs end up as empty column values!

This also explains why inserting an ASCII space into the MSB appears
to partially work. The MSB of the length is then nonzero and strlen()
happily reads on until it sees the next zero byte in the binary
string. Not exactly what one wants to happen. :) It seems to me that
this statement should check for the binary/OCTET character set and use
the length value instead of strlen() for binary strings. I may attempt
to create a patch, if I can figure out the proper way to do this in
this section of the code.

Through visual inspection of the source code I found more places that
don't seem to fully take into account binary string types. I hesitate
to call these bugs due to my unfamiliarity with FB code. So, let's
just say I have questions about why some code can ignore the OCTET
character set issue:

1) CVT2_compare(), in src/jrd/cvt2.cpp. Lines 243-263 and 335-353
assume that the string fill character is an ASCII space. However,
CVT_move() in cvt.cpp seems to correctly use 0x00 for fill characters
in binary strings. Is there some reason CVT2_compare() doesn't do
something similar?

2) add_row(), in src/isql/isql.epp. Line 1900 calls strncpy() to copy
text into the allocated buffer. Shouldn't it check to see if the
column uses OCTETS? It would seem that in this case, the code should
attempt to convert the input text from the, assumed, hex
representation into its binary representation. Or, am I missing
something here?

3) print_item(), in src/isql/isql.epp. Line 5833-5846 assumes that all
SQL_VARYING types can be passed directly to ISQL_printf() and
sprintf(). However, if you look at lines 5728-5744, the SQL_TEXT type
case checks for OCTETS and prints the hex representation for binary
strings. Shouldn't the SQL_VARYING case do the same thing?

4) process_statement(), in src/isql/isql.epp. Various sections of code
in this function seem to check for OCTETS in SQL_TEXT and SQL_VARYING
columns. While others do not. Is this correct?

Answers to the above questions and general comments or insights would
be greatly appreciated.

Thanks,
Fred P.