firebird-architect - RE: [Firebird-Architect] Re: UTF-8 vs UTF-16

Subject	RE: [Firebird-Architect] Re: UTF-8 vs UTF-16
Author	David Schnepper
Post date	2003-08-26T03:24:41Z

> -----Original Message-----
> From: Olivier Mascia [mailto:om@...]
> Sent: Monday, August 25, 2003 4:29 AM
> To: Firebird-Architect@yahoogroups.com
> Subject: Re: [Firebird-Architect] Re: UTF-8 vs UTF-16
>
>
>
> Regarding french language, characters with accent should sort as the
> same characters without accent, just as it is in a french dictionnary.
> Don't know for spanish. But I assume this rule is mostly valid, with
> maybe some exceptions that could justify some collating rules.
>

While I certainly will not question a native French speaker's
statement regarding correct French lexical ordering,
I will point out that the lexical ordering implemented in
FR_FR, as well as the dBASE & Paradox French orderings are
more complex than what you outlined above.
If the correct ordering for French dictionary order is different,
we need to correct it.

In FR_FR
"deja VU" (d <e-accent-acute> j a V U, in case it doesn't survive email)
is collated as if it were written

DEJAVU@____'_@dejaVU@____ __

Where @ is a special code that sorts lower than any other value,
and _ sorts lower than any accent or punctuation.

Or, in words,
collate first by base character (A...Z),
second by accent in backwards order (with specific order for each accent),
third by upper vs lower case (with upper before lower)
fourth by punctuation characters (in a specific order),
treating space as a punctuation character.

The collation for Spanish is similar, with the additional
cases of
n <n-tilde> --> sorts as a primary letter, after n, before o.
ll --> This two letter sequence sorts as a primary letter, after l, before m
ch --> This two letter sequence sorts as a primary letter, after c, before d
(and, of course, special cases for LL, Ll, lL, CH, Ch, cH...)
except spanish does not put accents in reverse order
during 2nd order collation.

Dave