Subject Re: [Firebird-Architect] UTF-8 and Compression
Author Jim Starkey
Claudio Valderrama C. wrote:

>Olivier Mascia wrote:
>
>
>
>>So a simplified view of what a collation table would be is a simple
>>map of some unicode values to some other binary value. To sort
>>according to the collation, the sort of the characters is done based
>>on their unicode value passed through that map. Any character not
>>considered by the collation is not in the map and sorts according to
>>its original unicode value.
>>
>>
>
>Nice simplified view for a discussion, but now if we can concentrate on real
>needs, they would show the real dimension of the problem.
>
>
>
I don't think anyone was trying to trivialize the collation scheme. The
only difference between the proposed "universal utf-8" and the current
schemas is that collations are now specific to a character set; if a
string is not in that character set, it must be converted before any of
the collation specific functions are called. In the new scheme, all
collations would operate directly off utf-8, but; their semantics would
be remain unchanged. What a collation would do with a character outside
it's natural character set would be up to the collation implementor (at
least until Adriano gets on his case).

An interesting related question (I think somebody else brought it up) is
whether assignment of a string containing "foreign" characters can be
assigned to a string declared as a specific character set (the character
declaration is probably only advice on how long the physical field
should be).

--

Jim Starkey
Netfrastructure, Inc.
978 526-1376