Subject Re: [firebird-support] Bug with character sets
Author Martijn Tonies
>> Client has to determine number of characters by dividing buffer size
>> with "max bytes per character". For char(2) data in utf8, buffer size
>> will be 8 byte, and max bytes per character is 4, so the data should eb
>> trimmed to 2 characters (codepoints).
>
> It is not an easy task if you have larger strings. Client library
> knows only length of data in bytes (without trailing blanks - they are
> cut off by server to save network traffic). So, if you have field
> CHAR(100) (size of buffer = 400 bytes) and receive from server piece of
> data with length of 10 bytes, how many spaces you must add to the tail?
> Remember, that these 10 bytes may be 10 ASCII characters or 5 non-ASCII
> characters or 2 ancient Egypt hieroglyphs + 1 character from base plane.
> The task is easier for components or external driver which are aware
> of Unicode - they can transform the data into UCS4 and then cut it to
> 100 characters, but for fbclient this mission is impossible for now.
>
> SY, SD.

I totally fail to see why the client library knows nothing about this? Isn't
it the client library that is the "glue" between the network protocol and
the client application? Yes, it is, so it should present properly encoded
character strings to the client application.


With regards,

Martijn Tonies
Upscene Productions
http://www.upscene.com

Download Database Workbench for Oracle, MS SQL Server, Sybase SQL
Anywhere, MySQL, InterBase, NexusDB and Firebird!

Database questions? Check the forum:
http://www.databasedevelopmentforum.com