Subject | Re: [firebird-support] Bug with character sets |
---|---|
Author | Martijn Tonies |
Post date | 2009-05-20T06:11:33Z |
>> Dimitry Sibiryakov wrote:Good question, given that each column can have a different characterset,
>>>> I totally fail to see why the client library knows nothing about
>>>> this? Isn't
>>>> it the client library that is the "glue" between the network
>>>> protocol and
>>>> the client application? Yes, it is, so it should present properly
>>>> encoded
>>>> character strings to the client application.
>>>
>>> It is "glue", right, but network protocol does not include actual
>>> data length in characters, only in bytes.
>>
>> But you *do* get info about charset id, right? And from that you
>> (fbclient) can determine the actual field size in characters using the
>> method Milan described. From that, you would have to be able to parse
>> the data in the encoding used to find the byte size of the actual
>> data.
>> This last step would probably require a lot of "knowledge" in fbclient
>> of all possible character encodings, which I expect would bloat that
>> dll
>> a bit more than anyone would like. Or would it?
>
> Or just have a character length field in the data you receive and have
> the server calculate this using the character set information it must
> already have so on the client side I just get data I can use without
> jumping through hoops!
>
> Actually thinking now I'm not even sure I understand exactly what kind
> of data I'll get back in different situations. Let's say when I
> create the database I set the default character set to UTF8. I create
> a table with two char(4) columns and on one of them I specify a
> character set of ISO8859_1 and leave the other to use the default.
> Then I create another two columns using varchar(4) this time and again
> specify ISO8859_1 for one of them and leave the other to use UTF8. I
> insert some data into the table. The UTF8 columns get a string with a
> single byte character and a double byte character so the string length
> is 2 and the bytes used is 3. The ISO8859_1 columns just get a two
> character/byte string.
>
> Now I connect from a client using the C API and specify I'll be using
> UTF8 as the connection character set. I select from the table. What
> do I get? Does the server convert all the column data to UTF8? Or do
> I need to look at the char set id set for each column and if it's not
> UTF8 I have to do a conversion from what it's using to UTF8? From
> what I've seen the char(4) columns with return back a size of 4 (for
> the ISO8859_1 column) or 16 (for the UTF8 column) but what will the
> varchar(4) columns return? With the UTF8 column will it return the
> byte size of 3 bytes or the character length of 2 UTF8 characters?
>
> Is there a document on all this that I've failed to find?
what does the client receive? How does this work with the connection
characterset? Is there a "guide" somewhere that explains this all?
With regards,
Martijn Tonies
Upscene Productions
http://www.upscene.com
Download Database Workbench for Oracle, MS SQL Server, Sybase SQL
Anywhere, MySQL, InterBase, NexusDB and Firebird!
Database questions? Check the forum:
http://www.databasedevelopmentforum.com