Subject | Re: [firebird-support] UTF8 in firebird ? |
---|---|
Author | Lester Caine |
Post date | 2012-01-06T23:28:16Z |
Vander Clock Stephane wrote:
case ... and the speed cost in accommodating that is well worth the reliability
in handling ANY international character.
I simply can't understand why you want to add yet another 256 character
collation, when there are plenty INCLUDING the one you seem to think is missing?
THAT is not unicode and would need to be translated to UTF8 if THAT is what is
required externally, the first code space in UTF-8 is 16 bit and trying to cram
that into 8 bits is exactly what character sets was designed for.
http://www.destructor.de/firebird/charsets.htm is a nice summary of the
available collations, and if one of them is not suitable, then adding another is
not difficult.
Personally I've now moved from single byte character sets to unicode simply
because I don't know what my customers are going to use and I accept that this
WILL be slower when searching for a name or address, and allows translation to
any language in the content. I only speak English so I could simply ignore the
rest of the world, but a growing percentage of orders are coming from Japan,
China and the like and their address data just goes straight in the database
without any worry.
--
Lester Caine - G8HFL
-----------------------------
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk//
Firebird - http://www.firebirdsql.org/index.php
> Keep utf8 like it is if you want, but why not add a new charset likeAs I have already said ... unicode needs 24 bits of storage per character worse
> UTF8_SVDC that is completely egual to UTF8 except that it's considere that
> when i write varchar(250) = 250 bytes (or 250 code point if you prefere) ?
case ... and the speed cost in accommodating that is well worth the reliability
in handling ANY international character.
I simply can't understand why you want to add yet another 256 character
collation, when there are plenty INCLUDING the one you seem to think is missing?
THAT is not unicode and would need to be translated to UTF8 if THAT is what is
required externally, the first code space in UTF-8 is 16 bit and trying to cram
that into 8 bits is exactly what character sets was designed for.
http://www.destructor.de/firebird/charsets.htm is a nice summary of the
available collations, and if one of them is not suitable, then adding another is
not difficult.
Personally I've now moved from single byte character sets to unicode simply
because I don't know what my customers are going to use and I accept that this
WILL be slower when searching for a name or address, and allows translation to
any language in the content. I only speak English so I could simply ignore the
rest of the world, but a growing percentage of orders are coming from Japan,
China and the like and their address data just goes straight in the database
without any worry.
--
Lester Caine - G8HFL
-----------------------------
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk//
Firebird - http://www.firebirdsql.org/index.php