Subject Re: [Firebird-Architect] Re: UTF-8 vs UTF-16
Author Olivier Mascia
Hello David,

Tuesday, August 26, 2003, 5:24:41 AM, you wrote:

DS> While I certainly will not question a native French speaker's
DS> statement regarding correct French lexical ordering,

You could. Never mind. :)

DS> I will point out that the lexical ordering implemented in
DS> FR_FR, as well as the dBASE & Paradox French orderings are
DS> more complex than what you outlined above.
DS> If the correct ordering for French dictionary order is different,
DS> we need to correct it.

Hmm. I actually don't use any collation with my FB databases because I
was never very satisfied with them. Might be a cause-relation effect.

In a french dictionnary, the accented characters don't appear
before or after the same letters without accent. They appear sorted
with everything else, as if the accent was not there. That might not
be the common behaviour of french language sorting in data computing
world though.

DS> In FR_FR
DS> "deja VU" (d <e-accent-acute> j a V U, in case it doesn't survive email)
DS> is collated as if it were written

DS> DEJAVU@____'_@dejaVU@____ __

David, here are some extracts from a french dictionnary :

eau-de-vie (no accents, means brandy)
ébahi (e-cute bahi, means staggered)
...
écarter (e-cute carter, means to spread)
ecchymose (no accents, means bruise)
ecclésiastique (eccl e-cute siastique, means ecclesiastical)
écervelé (e-cute cervel e-cute, means scatty)
...

So you can clearly see that the sequence does not take accents into
account. This is true for all accents in french. These sentences are
based on everyday-life facts in french. But I'm sure I could get some
official word on it somewhere. I'll have a look.

But I would say that for most french language people, sorting such
as letters with accents come before or after same letters without
accents sounds illogical. At least compared to a dictionnary search
people are used to.

Now about comparing : testing string "école" (e-cute cole) and "ecole"
(no accents) for equality should return FALSE (different). Of course,
"ecole" (no accent) is wrong but that would not make an equality
comparison succeed.

On the other hand looking for all words starting with "eco" (no
accents) should ideally return "école" (e-cute cole) as well as words
starting with the no-accent 'e'.

In a no-case sorting, 'e', 'é' (e-cute), 'E', 'É' (E-cute) should all
be equivalent. While in a typical case-sensitive sort, 'e' and 'é'
would be equal and AFTER 'E' and 'É' (equal too).

--
Best regards,
Olivier Mascia