Subject Re: [firebird-support] Bug with character sets
Author Dimitry Sibiryakov
>>> 1. What does Milan's code know that fbclient doesn't?
>> Length of string in characters.
>
> I understand that Milan derives that from 1) buffer size and 2) max
> number of bytes per character for the encoding used. The latter is
> deduced from a charset id. Where does he get that charset id?

From SQLVAR structure, but your assumption is wrong. One can't get
right string length from buffer size and number of bytes per character.

>>> 2. Where does Milan's code get that info from?
>> From conversion from UTF8 to UTF16 (I have a feeling that Milan's
>> procedure can have incorrect results if encounter character with 4-bytes
>> code, though I most likely am wrong).
>
> Not really - that's how he does the trimming, not how he deduces the
> correct length in codepoints.

These tasks are absolutely the same.

>>> 3. Why doesn't fbclient have that info?
>> Because fbclient doesn't know what UTF8 is and how convert it into
>> UCS. FB client is rather simple tosser of data from transport packets
>> into client structures and back.
>
> Does fbclient know the charset id?

Yes, but it has no idea what this id means.

>>> 4. Can FB be changed so fbclient can get that info in the future, and
>>> use it to trim the buffer to the right size before passing it to the
>>> client application?
>> Maybe, but so far nobody knows a good way to accomplish that.
>
> Can fbclient be provided with the charset id?

It already is provided, but there is no good for it.

> I think you're wrong here. Any author of a library or framework should
> really strive to make the interfaces intuitive and easy to understand.
> Anything else will cause a lot of frustration and a lot of unnecessary
> application bugs.

So do authors of API envelopes such as IBProvider or FIB+. Rare
application today use API directly.

> But even if the answer to 4 is that there *is* an easy way to provide
> fbclient with the necessary info to trim the result (i.e. the charset
> id), I'm not convinced fbclient should be bloated with the required
> capability to parse and trim all charsets available in Firebird.

Here you are exactly right. That's why fbclient has no idea about
character sets - it would require to add whole INTL module into it,
including ICU.

SY, SD.