Subject | Re: [firebird-support] Writing UTF16 to the database |
---|---|
Author | Olivier Mascia |
Post date | 2005-02-21T16:39:22Z |
Ann,
Le 21-févr.-05, à 17:09, Ann W. Harrison a écrit :
2 or 3 days ago. So I'm very glad to see the same idea popping up in
your mind. ;-)
The engine stores a single Unicode representation and the character
sets qualifications at the DDL level or the string introducers are just
hints to the engine interfaces to translate in and out as appropriate.
I would advocate for a storage representation using UTF-8.
Pure 7 bits ascii strings would use a single byte.
Most of the common accented characters of many european languages would
use 2 bytes.
Most of the asian characters would need 3 bytes, and some 4 bytes
though.
Besides these considerations, UTF-8 is the default representation for
XML (any XML processor should be ready to process and produce UTF-8 at
the very least).
With a good string class, handling strings encoded in UTF-8 is very
easy.
At some time, this will have to pop up on the Architect list I think.
--
Olivier Mascia
Le 21-févr.-05, à 17:09, Ann W. Harrison a écrit :
> Adriano dos Santos Fernandes wrote:This is exactly what I tried to say, in less understandable statements,
>>
>> WhatsNew of new INTL is here:
>> http://cvs.sourceforge.net/viewcvs.py/firebird/firebird2/doc/
>> WhatsNew?rev=1.45.2.3&only_with_tag=B2_0_intl&view=auto
>>
>> Allow the use of UTF16 in columns isn't a difficult task but is
>> deactivated because isn't complete.
>> Allow using UTF16 as connection charset is difficult and isn't yet
>> started.
>
> Is it necessary to store different character representations in the
> database? Could we not choose some Unicode representation and store
> only that, translating in and out as appropriate?
2 or 3 days ago. So I'm very glad to see the same idea popping up in
your mind. ;-)
The engine stores a single Unicode representation and the character
sets qualifications at the DDL level or the string introducers are just
hints to the engine interfaces to translate in and out as appropriate.
I would advocate for a storage representation using UTF-8.
Pure 7 bits ascii strings would use a single byte.
Most of the common accented characters of many european languages would
use 2 bytes.
Most of the asian characters would need 3 bytes, and some 4 bytes
though.
Besides these considerations, UTF-8 is the default representation for
XML (any XML processor should be ready to process and produce UTF-8 at
the very least).
With a good string class, handling strings encoded in UTF-8 is very
easy.
At some time, this will have to pop up on the Architect list I think.
--
Olivier Mascia