Subject Re: [firebird-support] Bug with character sets
Author Kjell Rilbe
Martijn Tonies wrote:
> > Firebird returns the character set ID as part of result. You need to
> > read in the field RDB$BYTES_PER_CHARACTER from table RDB$CHARACTER_SETS
> > for that RDB$CHARACTER_SET_ID and then divide the reported length with
> > this number. When done, use the result to truncate the returned string
> > to that many characters. Example:
> >
> > select 'Y' from rdb$database;
> >
> > returned buffer: 'Y ';
> > returned length: 4
> > returned charset ID: 4 (UTF8)
> >
> > select RDB$BYTES_PER_CHARACTER
> > from RDB$CHARACTER_SETS
> > where RDB$CHARACTER_SET_ID = 4;
> >
> > real length in characters = 4/4 = 1
> > truncate the result to 1 character to get 'Y'
> >
> > In FlameRobin we read in all the character set lengths when connecting
> > to the database, and later just reuse that info.
>
> Could be me, but isn't the -database server- supposed to return this
> character data correctly? If there's a Y stored in UTF8 format, it should
> return a Y in UTF8 format. Why does it return 'Y<<<' (where < is space)

It returns a C-style struct with a data buffer that's large enough to
hold the max number of bytes that could be required for that column. So,
for 2 UTF8 chars it reserves an 8-byte buffer and specifies that buffer
size in the struct. The buffer contains the character data in the
correct encoding, padded with spaces. How much of the buffer actually
should be used as character data has to be calculated as described by Milan.

Why it's done like this, you'll have to ask the FB guys.

Kjell
--
------------------------------
Kjell Rilbe
DataDIA AB
E-post: kjell.rilbe@...
Telefon: 08-761 06 55
Mobil: 0733-44 24 64


[Non-text portions of this message have been removed]