Subject Re: UTF8, malformed string error
Author Roman Rokytskyy
> Well, explicitly calling an "long string to UTF8 string" works, the
> strings get inserted.
>
> Reading them fails and gets you wrong charactersets, IBO doesn't
> seem to support all that UTF8 stuff.

You have to call "UTF8 string to long string" conversion routine then :)

> What does Firebird expect here? Should clients transform the server
> side chars to something at the client? I guess, when they request it
> from the server with UTF8, it retursn them as such. I just connected
> with ISO8859_1 and now the strings are returned correctly, so that
> would be a server side translation, correct?

Firebird will translate characters from the charset of the particular
field (or default db charset) into the charset that was specified in
lc_ctype property in DPB. If nothing was specified, lc_ctype=NONE and
data are stored and read "as is" (note, very likely it won't work for
new UTF8 charset, though UNICODE_FSS will "swallow" it).

> Either way -- there's another problem.
>
> How do I know when to call these, given the generic nature of this
> application?

Specify the lc_ctype when connecting to the database. If all your
applications do this and database has correctly specified charsets in
fields, you will always obtain data according to your lc_ctype.

> All I know is "string" or "widestring" in Delphi, nothing about
> encodings.

If I'm not mistaken, the wide string is UCS2 (Unicode, 2 bytes per
character). Theoretically IBO could have AsWideString property and
perform the conversion according to what you have specified in DPB.

Also, I did not check the IBO sources, but I am pretty sure that
AsString property simply converts the specified string bytewise into
byte array. And now there's a problem - somewhere there (I guess in
IBO) the 0x00 character is considered to be string terminator. Now,
when you assign the wide string, it always has 2 bytes per character
and on little-endian platform that would be something like 0x65 0x00
0x66 0x00 0x67 0x00 and so on.

Now, when you give that string to IBO, somewhere on the way some
component treats 0x00 as C-string terminator and stops processing
data. I am almost 100% sure that is not fbclient and neither the
database engine, since the XSQLVAR have two properties: sqllen and
sqldata. And the sqllen contains the length of the passed string in
bytes (and it must correspond to what we try to send to the server). I
do not remember exactly, but one of the test cases in our JDBC driver
inserts 0x00 in the middle of the byte array into the database and
then reads it back. That works in pure Java protocol version and when
we use fbclient.dll/fbembed.dll.

Hope this helps.

Roman