Subject Re: [firebird-support] Bug with character sets
Author Dimitry Sibiryakov
>> From SQLVAR structure, but your assumption is wrong. One can't get
>> right string length from buffer size and number of bytes per character.
>
> Yes you can. That's what Milan's code does. For char(N) columns, the
> "right string length" in codepoints is N, but the buffer returned from
> fbclient can contain anything from N to 4N codepoints for a char(N) utf8
> column. That's not intuitive, and that's what imho needs to be addressed
> somehow.

Right, the buffer ban contain anything from N to 4N, but find out how
many exactly is not a trivial task. It is more complicated that
(SizeOfBuffer/BytesPerCharacter).

> You seem to think that Milan's code does both in the same step by
> converting to UTF16. This is not correct. If you have a buffer like this:
> N
> O
> <space>
> <space>
> <space>
> <space>
> <space>
> <space>
> for a char(2) utf8 column, then a conversion of that buffer into utf16
> would result in a 16 byte string containing 8 codepoints. You still
> wouldn't know N or how to determine N. Utf16 does handle codepoints that
> are 4-byte in utf8. Utf16 uses surrogate pairs, so even in utf16 not all
> codepoints are two bytes. Some are four bytes.

Ok, but look here: for CHAR(N) you can find out N as
(SizeOfBuffer/BytesPerCharacter). Note, that fbclient has no idea about
BytesPerCharacter. To get right string you must cut off (N-RealLength)
trailing spaces (in your example you must cut off 6 spaces).
But how one can determine "RealLength"? The only way is scan buffer
and count codepoints which in UTF8 can have size from 1 to 4 bytes.
Milan's code is aware of UTF8 and can calculate RealLength, fbclient
isn't.

And anyway: will it help much if instead of
('N','O',' ',' ',' ',' ',' ',' ') the buffer will contain
('N','O',\0,\0,\0,\0,\0,\0)?..

SY, SD.