Subject Re: [Firebird-Architect] UTF-8 and Compression
Author Olivier Mascia
Le 01-mars-05, à 19:49, Ann W. Harrison a écrit :

> Actually, the collation problem will continue to be more complicated
> than that because most serious collations are multi-level. There
> isn't a simple byte for byte transformation that makes the sort "work".

I, generally, agree.

> The way the current collations work is to assign a byte value to the
> "base" character (e.g. 'A'), then append a bits that give value to
> different accents, upper vs lower case, and the following white space
> or
> punctuation.
>
> Here is Dave Schnepper's explanation of the issue:
>
> The InterBase collation orders for ISO8859 (such as SV_SV) follow a
> full
> linguistic (eg: dictionary) collation order. In such a collation order
> spaces (and other punctuation marks) are of 4th level importance.
>
> First order: A is different than B
> 2nd order: A is different from A-accent-grave
> 3rd order: A is different than a
> 4th order: The type of punctuation mark is important.
>
> For instance:
> Redwing
> Red wing
> Red-wing
> Redwood
> Red wood
> Red worm

Well... Now show that result to any french people using a computer for
any other task than the very specific one of writing the next edition
of the "Larousse" french dictionnary and they will cry at you : "what's
this mess ! You're software is buggy".

Most french people using computers will expect this to sort as:

> Red wing
> Red wood
> Red worm
> Red-wing
> Redwing
> Redwood

To "justify" they are right and that your software is wrong those
french people will throw the ASCII codes at you to show that space and
hyphen comes before letters !! Isn't that stunning ?

Now the generic comment of Dave probably apply with more importance to
some other language, so the problem has to be handled correctly.

My point is that there is a large difference in what people expect from
general business softwares compared to what is presented to them when
they open the latest edition of a famous french dictionnary. The same
might very well apply to other languages and cultures.

Also, there is a danger in sorting things in an academic perfect way,
when programmers don't have the same facilities in their programming
language or operating system. Imagine that Firebird sort french as
indicated by Dave. Now what mess will happen when the application
itself won't be able to sort in the same way (when sorting in memory
for instance). Just try to make Windows sort a french string using Dave
full recipe...

;-)

--
Olivier Mascia