Subject Re: [firebird-support] Bug with character sets
Author Kjell Rilbe
Dimitry Sibiryakov wrote:

>>Now Dimitry Sibiryakov claims that fbclient doesn't receive enough info
>>from the server to be able to trim to the right number of codepoints (or
>>bytes for that matter), while Milan Babuskov actually does just that in
>>FlameRobin.
>>
>>So:
>>
>>1. What does Milan's code know that fbclient doesn't?
>
> Length of string in characters.

I understand that Milan derives that from 1) buffer size and 2) max
number of bytes per character for the encoding used. The latter is
deduced from a charset id. Where does he get that charset id?

>>2. Where does Milan's code get that info from?
>
> From conversion from UTF8 to UTF16 (I have a feeling that Milan's
> procedure can have incorrect results if encounter character with 4-bytes
> code, though I most likely am wrong).

Not really - that's how he does the trimming, not how he deduces the
correct length in codepoints.

>>3. Why doesn't fbclient have that info?
>
> Because fbclient doesn't know what UTF8 is and how convert it into
> UCS. FB client is rather simple tosser of data from transport packets
> into client structures and back.

Does fbclient know the charset id?

>>4. Can FB be changed so fbclient can get that info in the future, and
>>use it to trim the buffer to the right size before passing it to the
>>client application?
>
> Maybe, but so far nobody knows a good way to accomplish that.

Can fbclient be provided with the charset id?

>>The last one, item 4, would constitute "a real solution" imo.
>
> If you can propose a good solution - feel free to discuss it in
> Firebird-architect.

I'm afraid it's beyond my capabilities, but hopefully I can ask the
right questions to get some other people's grey cells spinning in a
creative direction. :-)

>>That should 1) reduce the amount of client application bugs and 2) make it easier for
>>client application code to trim the buffer content correctly.
>
> Number of application bugs can be reduced by application's developer
> only. No point to do other's job.

I think you're wrong here. Any author of a library or framework should
really strive to make the interfaces intuitive and easy to understand.
Anything else will cause a lot of frustration and a lot of unnecessary
application bugs.

But even if the answer to 4 is that there *is* an easy way to provide
fbclient with the necessary info to trim the result (i.e. the charset
id), I'm not convinced fbclient should be bloated with the required
capability to parse and trim all charsets available in Firebird.

Kjell
--
--------------------------------------
Kjell Rilbe
DataDIA AB
E-post: kjell@...
Telefon: 08-761 06 55
Mobil: 0733-44 24 64