Subject | Re: [firebird-support] Re: cannot transliterate between character between character sets |
---|---|
Author | Terry Johnson |
Post date | 2004-04-29T04:36:10Z |
Thanks Peter. I have been using UTF-8 / None and that's been up to the
task. But I presume if this was done on a key field, then sorting would
get interesting in the Chinese sense too. Oddly enough, I'm actually
using this method to store Chinese. They are BMP chars though.
Terry
peter_jacobi.rm wrote:
task. But I presume if this was done on a key field, then sorting would
get interesting in the Chinese sense too. Oddly enough, I'm actually
using this method to store Chinese. They are BMP chars though.
Terry
peter_jacobi.rm wrote:
>Hi Terry, All,[Non-text portions of this message have been removed]
>
>UNICODE support in Firebird is mixed issue and gives
>interesting times (in the sense of the chinese proverb)
>to developers.
>
>Terry Johnson <terry@s...> wrote:
>
>
>>How about UCS4 data? How do you go about entering that? Is that still
>>managed as a string entry?
>>
>>
>
>There is no direct support for non-BMP characters, so when
>you are asking about UCS4 because your dire need to support
>Linear B or Byzantine Musical Notation (or more likely GB18030),
>you are somewhat out of luck. (Don't stop reading yet).
>
>Also the feature of automatic character set conversion
>by Firebird itself, turns into a bug, if a charset conversion
>is called for, which changes the byte length of the string.
>
>So you are essentially left with two models:
>
>a) Have some fixed database character set and use it also
>as your connection charset
>
>b) Use all the funny charsets you need in the database, connect
>using charset NONE, and have necessary charset conversions
>in your software (are middle layer, like .NET provider).
>Requires FB 1.5.1
>
>So, again to the question of storing your UCS-4 character data:
>
>I see these options:
>
>A) Use char (4*N) character set OCTETS (FB's BITSTRING) to store
>the UCS-4 unchanged. Better store it big endian. Not pretty.
>
>B) Store as UTF16BE, using the fbintl DLL from pjcolkit:
>http://www.jodelpeter.de/i18n/fbarch/index.htm
>Untested
>
>C) Store UTF-8 in fields declared charset NONE. Ugly, but works.
>
>D) Use UNICODE_FSS, but if you really have non-BMP chars,
>better don't store the UTF-8 form, but CESU-8
>http://www.unicode.org/reports/tr26/
>Should work. Sort of. Feedback welcome.
>
>As you can see from the list, it's a rather awkward
>choice, but pragmatically speaking each of these options
>will work, only none of them gets an award for clean design.
>
>Also note that options A) and B) defeat the RLE compression
>for stored data and may be ineffective for this reason.
>
>Regards,
>Peter Jacobi
>
>
>
>
>
>Yahoo! Groups Links
>
>
>
>
>
>
>
>
>