Subject | Re: [firebird-support] Storing Delphi 2009 "UnicodeString" into database, UTF8? |
---|---|
Author | Kjell Rilbe |
Post date | 2009-05-21T21:31:48Z |
Stefan Heymann wrote:
Anyway, I was just trying to say that if you want your string lib to
treat the utf16 string like anything other than a sequence of 16 bit
words, then you've got *a lot* of things to consider!
It's not just a matter of differentiating between 16 bit words and
Unicode codepoints.
For "DÄR" with Ä coded as A + diacritic composite ("DA¨R"), what exactly
does substring(MyDÄRstring, 3, 1) mean? Is it the diacritic '¨' or the 'R'?
Kjell
--
--------------------------------------
Kjell Rilbe
DataDIA AB
E-post: kjell@...
Telefon: 08-761 06 55
Mobil: 0733-44 24 64
> The rendering thing is a completely different beast. Before you go toNice sum-up! :-)
> rendering, you must be able to handle the storage stuff (which is
> difficult enough, as you can see in this discussion).
>
> Things like diacritics for latin letters as your above example are
> simple when compared with diacritics in Arabic scripts for example.
> Here characters can "melt" into completely different glyphs. The
> Unicode Standard explains these things well.
>
> Using Unicode in an application doesn't just mean that you lose the
> 1:1 relationship between bytes (octets) and characters. You also lose
> the 1:1 relationship between character and glyph. You cannot be sure
> whether your text runs left-to-right or right-to-left. Uppercasing,
> lowercasing and case-insensitive compares turn from relatively simple
> operations into an algorithm that is a complete chapter of the Unicode
> Standard. Comparing two strings is another chapter (you must normalize
> them before you can compare them and there are 4 different ways to do
> that). Etc etc.
>
> So, whoever thinks Unicode is something like Latin-1 on 16-bit
> steroids (and, to me, it looks like the Borland [or whatever they call
> themselves today] guys do): YOU ARE WRONG!
Anyway, I was just trying to say that if you want your string lib to
treat the utf16 string like anything other than a sequence of 16 bit
words, then you've got *a lot* of things to consider!
It's not just a matter of differentiating between 16 bit words and
Unicode codepoints.
For "DÄR" with Ä coded as A + diacritic composite ("DA¨R"), what exactly
does substring(MyDÄRstring, 3, 1) mean? Is it the diacritic '¨' or the 'R'?
Kjell
--
--------------------------------------
Kjell Rilbe
DataDIA AB
E-post: kjell@...
Telefon: 08-761 06 55
Mobil: 0733-44 24 64