Subject Re: [firebird-support] Right-padded char fields?
Author Kjell Rilbe
Adriano dos Santos Fernandes wrote:
> If we have '<char that use 2 bytes>' in a CHAR(3) UTF-8 column, engine
> will return a buffer with '<2 bytes><10 spaces>'. We know that valid
> bytes of this UTF-8 string are only the four first, but ISQL don't know
> how to handle this for arbitrary charsets and format the data in that
> charset.

I don't know about the technicalities of course, but it seems to me that
the most logical thing would be for the engine itself to know about the
used character set at any time it deals with character data, and act
accordingly.

This means:

1. Any string that's supposed to be n codepoints should be n codepoints
and nothing else, regardless of character encoding.

2. The API should be updated to support this.

3. Clients shouldn't have to be aware of how the engine achieves this.

So, when returning UTF encoded char(n) data, it should truncate the
string to be the number of bytes actually containing code points. If
this is done with null padding, buffer byte size field in a struct or
whatever is not up to me to decide. I'm just stating what would seem
most logical and intutive.

Kjell
--
--------------------------------------
Kjell Rilbe
DataDIA AB
E-post: kjell@...
Telefon: 08-761 06 55
Mobil: 0733-44 24 64