Subject Re: [Firebird-Architect] UTF-8 and Compression
Author Lester Caine
Paul Reeves wrote:

> On Tuesday 01 March 2005 09:37, Olivier Mascia wrote:
>
>>So a simplified view of what a collation table would be is a simple map
>>of some unicode values to some other binary value. To sort according to
>>the collation, the sort of the characters is done based on their
>>unicode value passed through that map. Any character not considered by
>>the collation is not in the map and sorts according to its original
>>unicode value.
>
> How are descending sorts ordered? I can see one argument that says
> characters outside the collation char set should still get sorted to the
> bottom of the list. But on the other hand some users might reasonably
> expect to see a full descending sort.

All of these sorts of points are why *I* see unicode as an option rather
than the basis for everything.

A simple 256 byte system for core character sets, and a 1114112 by 3or4
byte system (I think that is the total number of possible unicode
characters, but only 233915 are currently allocated ?) for unicode.

On the other hand, some means of selecting pages out of the unicode map
which are handled and 'ignore' pages that are not required for that
collation? http://www.unicode.org/charts/ provides a nice basis for
getting that to work and includes the basis for Normalisation, Collation
and Case Mapping and is broken down into nice chunks.

This *IS* handled by a lookup table giving 'weights' isn't it? The order
is provided by building a copy of the string using the character
weights, and sorting THAT column? So potentially we have a number of
4.5Mb(32bit aligned) lookup tables for each 'collation' - Up,Down,Upper
Case,Lower Case,Normalised ?

--
Lester Caine
-----------------------------
L.S.Caine Electronic Services