Subject | RE: [Firebird-Architect] Re: UTF-8 vs UTF-16 |
---|---|
Author | Dimitry Sibiryakov |
Post date | 2003-08-25T06:56:32Z |
On 24 Aug 2003 at 11:48, David Schnepper wrote:
UNICODE-16. No zoo of charsets, no multibyte encodings with a sword
of Damocles of buffer overflows.
Charsets have a part on client side only. And AFAIK only one
charset is used by client side at a time. And this charset is
determined by locale. Lets't put UNICODE->desired charset conversion
to client side where this can be done by system calls.
The only language-aware variable thing that left on server is
sorting. I don't know such languages as French and Spanish and can't
tell if the same characters can take different positions in sorting
order. Probably even sorting can be done according to one char-
position table.
three) lc_types? Nobody cares how characters are stored on server.
The only matter how they are put into user's buffer.
SY, Dimitry Sibiryakov.
>I think, for Firebird, that UNICODE-16 support should be putI think that Firebird engine could (and should) become completely
>in, and if people want to store the supplimental characters
>into it, well, it words for point c above, it should work
>for point b -- (as I doubt there are any other character
>sets that encode the characters other than how Unicode
>supplimental would) - and it doesn't work for point a.
UNICODE-16. No zoo of charsets, no multibyte encodings with a sword
of Damocles of buffer overflows.
Charsets have a part on client side only. And AFAIK only one
charset is used by client side at a time. And this charset is
determined by locale. Lets't put UNICODE->desired charset conversion
to client side where this can be done by system calls.
The only language-aware variable thing that left on server is
sorting. I don't know such languages as French and Spanish and can't
tell if the same characters can take different positions in sorting
order. Probably even sorting can be done according to one char-
position table.
>Here's my thoughts on the project:Wouldn't it be better to define one UNICODE charset and two (or
>
>Define a UNICODE_BE and UNICODE_LE character set.
>Define a character set alias UNICODE that goes to
>the proper character set, on a platform specific
>basis.
three) lc_types? Nobody cares how characters are stored on server.
The only matter how they are put into user's buffer.
>Wire format wouldn't need modification -- clientWhy server? Client can do it as well.
>would request UNICODE_BE format as part of dpb_lc_ctype,
>server would transliterate _LE into _BE for the client.
SY, Dimitry Sibiryakov.