Subject | Re: [Firebird-Architect] Re: [firebird-support] Writing UTF16 to the database |
---|---|
Author | Lester Caine |
Post date | 2005-02-28T10:19:54Z |
Olivier Mascia wrote:
not matter, it's how it is managed that is the problem. What we ideally
need is a fixed length record that we can do all of the string
operations on, in which case 3 bytes per character covers every
eventuality, but 4 bytes may be even more practical IN THE STRING CLASS?
This would be in addition to the current single byte processing, and I
think we have eliminated the need for any variable multibyte mechanism
INTERNALLY. Just convert UTF8, UTF16 and UTF32 to the internal three
byte string class and work all of the processing on fixed length
strings. Don't need Unicode, then the multibyte character code is not
needed, and we just get single byte character strings without any
overheads?
Most of the current problems being highlighted come about because we do
not have a fixed character length to work to?
--
Lester Caine
-----------------------------
L.S.Caine Electronic Services
> Absolutely right. And look at the convolutions Microsoft did in Win32Since what is stored internally can be compressed, what is stored does
> (XP versions at least) to add GB18030 over their 16 bits unicode
> representation. True, GB18030 will hit four bytes represented in UTF-8.
> But it will also hit 4 bytes represented in UTF-16. By now people will
> have understood I'm an advocate of UTF-8 as internal (db storage)
> universal representation of all strings. ;-)
not matter, it's how it is managed that is the problem. What we ideally
need is a fixed length record that we can do all of the string
operations on, in which case 3 bytes per character covers every
eventuality, but 4 bytes may be even more practical IN THE STRING CLASS?
This would be in addition to the current single byte processing, and I
think we have eliminated the need for any variable multibyte mechanism
INTERNALLY. Just convert UTF8, UTF16 and UTF32 to the internal three
byte string class and work all of the processing on fixed length
strings. Don't need Unicode, then the multibyte character code is not
needed, and we just get single byte character strings without any
overheads?
Most of the current problems being highlighted come about because we do
not have a fixed character length to work to?
--
Lester Caine
-----------------------------
L.S.Caine Electronic Services