Subject | Re: [firebird-support] Writing UTF16 to the database |
---|---|
Author | Scott Morgan |
Post date | 2005-02-22T23:40:44Z |
Brad Pepers wrote:
UTF-32 is reasonable. But overall it doesn't really matter, what's
important is getting the data in and out in a useful form and the speeds
of the engine.
sting class. Hardest bit is handling multi byte situations, but then, we
already have that with many of the existing encodings.
There is a UCS2 (UTF-16) implemntation in the source
(src/intl/lc_unicode_ucs2.c) which is used internally for...
available, the text is transcoded to UCS2 and then to the target encoding.
http://www.ibphoenix.com/main.nfs?a=ibphoenix&l=;PAGES;NAME='ibp_collation'
(section titled 'Two Conversion Objects')
who are totally ignorant of the various cultural diffrences in text
handling, not least of which is sort orders). But there is an open
source project that can, at the very least, help.
http://icu.sourceforge.net/
IIRC the dev team are aware of this project and plan to use it.
various unicode systems, there are standards to normalise them which
sould enable binary matching.
http://www.unicode.org/reports/tr15/
Scott
> Doing so would have many benefits with at least some hurdles:I think UTF-16 is the best compromise, hell disk space is cheep so
>
>1. Depending on the Unicode format used internally (UTF-8/16/32) this
>could have a high penalty on string sizes.
>
UTF-32 is reasonable. But overall it doesn't really matter, what's
important is getting the data in and out in a useful form and the speeds
of the engine.
>2. Also doing this will likely require that there is a string class thatA unicode string class isn't really that much more 'heavy' than a normal
>uses Unicode internally and this would make the class a heavier
>implementation
>
sting class. Hardest bit is handling multi byte situations, but then, we
already have that with many of the existing encodings.
There is a UCS2 (UTF-16) implemntation in the source
(src/intl/lc_unicode_ucs2.c) which is used internally for...
>3. You would need a new system to convert from/to Unicode from supportedWhen transcoding from one set to another, if there isn't a direct route
>character sets so it would largely replace the existing collation
>sequences and such which is a large job I'm sure.
>
>
available, the text is transcoded to UCS2 and then to the target encoding.
http://www.ibphoenix.com/main.nfs?a=ibphoenix&l=;PAGES;NAME='ibp_collation'
(section titled 'Two Conversion Objects')
>4. Comparison and collation with Unicode is not a trivial problem toCollation is a PITA no matter what (it's worrying how many devs I've met
>solve.
>
>
who are totally ignorant of the various cultural diffrences in text
handling, not least of which is sort orders). But there is an open
source project that can, at the very least, help.
http://icu.sourceforge.net/
IIRC the dev team are aware of this project and plan to use it.
>5. If the key of an index is a string, how is it handled now? I wasAlthough there are many ways you can encode certain glyphs in the
>under the impression it was stored in a binary format that is expected
>to compare properly using a straight binary comparison.
>
various unicode systems, there are standards to normalise them which
sould enable binary matching.
http://www.unicode.org/reports/tr15/
Scott