Subject | Re: [firebird-support] Storing Delphi 2009 "UnicodeString" into database, UTF8? |
---|---|
Author | Stefan Heymann |
Post date | 2009-05-20T22:33:18Z |
> But even "length in Unicode codepoints" is not 1-1 with "length inThe rendering thing is a completely different beast. Before you go to
> graphically rendered characters" because you have diacritics that have
> their own codepoints but apply the diacritic to the preceding codepoint,
> resulting in a composite character.
> For example, the letter Ä can be coded in two ways with Unicode:
> 1. The codepoint for the letter Ä.
> 2. The codepoint for the letter A followed by the codepoint for the
> diacritic "umlaut".
rendering, you must be able to handle the storage stuff (which is
difficult enough, as you can see in this discussion).
Things like diacritics for latin letters as your above example are
simple when compared with diacritics in Arabic scripts for example.
Here characters can "melt" into completely different glyphs. The
Unicode Standard explains these things well.
Using Unicode in an application doesn't just mean that you lose the
1:1 relationship between bytes (octets) and characters. You also lose
the 1:1 relationship between character and glyph. You cannot be sure
whether your text runs left-to-right or right-to-left. Uppercasing,
lowercasing and case-insensitive compares turn from relatively simple
operations into an algorithm that is a complete chapter of the Unicode
Standard. Comparing two strings is another chapter (you must normalize
them before you can compare them and there are 4 different ways to do
that). Etc etc.
So, whoever thinks Unicode is something like Latin-1 on 16-bit
steroids (and, to me, it looks like the Borland [or whatever they call
themselves today] guys do): YOU ARE WRONG!
Best Regards
Stefan