Subject Re: UTF-8 (various)
Author johnson_dave2003
--- In Firebird-Architect@yahoogroups.com, "Ivan Prenosil"

Lester posted links to the UTF 8 website. I have just had a chance
to glance at it, and have seen that most of the questions we have
been throwing around about collations etcetera are dealt with in some
detail at the abstract level. I will need a week or so to digest it,
but some things are apparentfrom even a cursory reading.


At the abstract level, It is reasonable to support multiple
collations in the same result set where character sets do not
overlap. For example, a result set containing data in both Arabic
and French may have french and iranian collations.

It might be reasonable to expect something like this:

select sport, stuff
from table
order by sport collation FR_FR, AR_IR


On the other hand, this should generate an error state:

select sport, stuff
from table
order by sport collation FR_FR, EN_US


Normalization of characters will be critical. The question is where,
and what normalization should be applied?

UTF-8 allows a single character to be expressed both as its atomic
representation, and as its decomposed parts. Normalizing strings
allows comparisons to occur without having to code for every case
individually.

I am in favor of normalizing characters at the session level, at the
same time as other character set type conversions should be taking
place.