Subject Re: [firebird-support] Bug with character sets
Author Martijn Tonies
>>> From SQLVAR structure, but your assumption is wrong. One can't get
>>> right string length from buffer size and number of bytes per character.
>>
>> Yes you can. That's what Milan's code does. For char(N) columns, the
>> "right string length" in codepoints is N, but the buffer returned from
>> fbclient can contain anything from N to 4N codepoints for a char(N) utf8
>> column. That's not intuitive, and that's what imho needs to be addressed
>> somehow.
>
> Right, the buffer ban contain anything from N to 4N, but find out how
> many exactly is not a trivial task. It is more complicated that
> (SizeOfBuffer/BytesPerCharacter).
>
>> You seem to think that Milan's code does both in the same step by
>> converting to UTF16. This is not correct. If you have a buffer like this:
>> N
>> O
>> <space>
>> <space>
>> <space>
>> <space>
>> <space>
>> <space>
>> for a char(2) utf8 column, then a conversion of that buffer into utf16
>> would result in a 16 byte string containing 8 codepoints. You still
>> wouldn't know N or how to determine N. Utf16 does handle codepoints that
>> are 4-byte in utf8. Utf16 uses surrogate pairs, so even in utf16 not all
>> codepoints are two bytes. Some are four bytes.
>
> Ok, but look here: for CHAR(N) you can find out N as
> (SizeOfBuffer/BytesPerCharacter). Note, that fbclient has no idea about
> BytesPerCharacter. To get right string you must cut off (N-RealLength)
> trailing spaces (in your example you must cut off 6 spaces).
> But how one can determine "RealLength"? The only way is scan buffer
> and count codepoints which in UTF8 can have size from 1 to 4 bytes.
> Milan's code is aware of UTF8 and can calculate RealLength, fbclient
> isn't.
>
> And anyway: will it help much if instead of
> ('N','O',' ',' ',' ',' ',' ',' ') the buffer will contain
> ('N','O',\0,\0,\0,\0,\0,\0)?..

RealLength is determined by using a server side "bytes per character",
the full (space padded) buffer is translated to a string, which is then
too long, and by using the bytes per character trick and the buffer
size, N is determined. The client needs to query a server side table
for that! How much sense does that make? Why not pass the N to the
client, the server -knows- this!!

With regards,

Martijn Tonies
Upscene Productions
http://www.upscene.com

Download Database Workbench for Oracle, MS SQL Server, Sybase SQL
Anywhere, MySQL, InterBase, NexusDB and Firebird!

Database questions? Check the forum:
http://www.databasedevelopmentforum.com