firebird-support - Re: [firebird-support] Bug with character sets

Subject	Re: [firebird-support] Bug with character sets
Author	Brad Pepers
Post date	2009-05-20T01:07:40Z

On 19-May-09, at 8:37 AM, Kjell Rilbe wrote:

> Dimitry Sibiryakov wrote:
>>> I totally fail to see why the client library knows nothing about
>>> this? Isn't
>>> it the client library that is the "glue" between the network
>>> protocol and
>>> the client application? Yes, it is, so it should present properly
>>> encoded
>>> character strings to the client application.
>>
>> It is "glue", right, but network protocol does not include actual
>> data length in characters, only in bytes.
>
> But you *do* get info about charset id, right? And from that you
> (fbclient) can determine the actual field size in characters using the
> method Milan described. From that, you would have to be able to parse
> the data in the encoding used to find the byte size of the actual
> data.
> This last step would probably require a lot of "knowledge" in fbclient
> of all possible character encodings, which I expect would bloat that
> dll
> a bit more than anyone would like. Or would it?

Or just have a character length field in the data you receive and have
the server calculate this using the character set information it must
already have so on the client side I just get data I can use without
jumping through hoops!

Actually thinking now I'm not even sure I understand exactly what kind
of data I'll get back in different situations. Let's say when I
create the database I set the default character set to UTF8. I create
a table with two char(4) columns and on one of them I specify a
character set of ISO8859_1 and leave the other to use the default.
Then I create another two columns using varchar(4) this time and again
specify ISO8859_1 for one of them and leave the other to use UTF8. I
insert some data into the table. The UTF8 columns get a string with a
single byte character and a double byte character so the string length
is 2 and the bytes used is 3. The ISO8859_1 columns just get a two
character/byte string.

Now I connect from a client using the C API and specify I'll be using
UTF8 as the connection character set. I select from the table. What
do I get? Does the server convert all the column data to UTF8? Or do
I need to look at the char set id set for each column and if it's not
UTF8 I have to do a conversion from what it's using to UTF8? From
what I've seen the char(4) columns with return back a size of 4 (for
the ISO8859_1 column) or 16 (for the UTF8 column) but what will the
varchar(4) columns return? With the UTF8 column will it return the
byte size of 3 bytes or the character length of 2 UTF8 characters?

Is there a document on all this that I've failed to find?

--
Brad