Subject Re: Collations
Author peter_jacobi.rm
Hi Ales,

I copy this message to Firebird-Architect, as it seems
to be be more on topic there. I suggest you join Firebird-Architect
to further discuss this thread. Perhaps some of the Elders will
advise us, whether we should move to the developer list, but
I would prefer Firebird-Architect.

--- In firebird-support@yahoogroups.com, Ales Smodis wrote:
> I should mention
> though that I'll be working under linux and will thus be able to
> generate only appropriate .so libs

Fine, I'm on Win32 and so we can test both sides of the chasm.

> Errr... Since the future seems to be unicode and Firebird does
> transliteration through unicode mappings anyway,

Yes, but not collations. The collation algorithm has to be
provided per charset. This actually makes sense, as a sortkey
implementing full UNICODE collation algorithm for the entire
UNICODE repertoire typically needs 32 bits per character, whereas
for each ISO-8859-* repertoire it should be possible to squeeze
this to 12 bit per characters. As index key length is a spare
resource in Firebird, this helps a lot.

> I went to see how
> unicode guys handle collations.
> [...]
> http://www.unicode.org/charts/collation/

Yes I know that link. This gives the default UNICODE
collation. All locale specific collations should be defined
in terms of changes to this collation.

> Otherwise you might try http://www.alphabets-world.com/

Thank you very much! I feel rather stupid, that I didn't found
that link.

> > So give me a link about Croation sort order if you find one.
> http://www.hr/hrvatska/language/abeceda.en.htm
> You might want to compare it with Slovenian alphabet/sort order:
> http://www.ijs.si/slo-chset.html

And thanks for these, too.

But this is still not enough, and 'reverse-engineering' may
still be the easiest approach for getting complete locale
specific collations. If I'm not mistaken, the pages you found
don't give the nitty-gritty details of, e.g. how to sort
the various accented characters in foreign words (Polish and
Czech differ in the relative order of á and ä, for whatever
reasons).

Whereas I would guess that the Win32 collating support is inferior
to Java and GNU, it's easiest to check for me. I've put the
result online at
http://groups.yahoo.com/group/Firebird-Architect/files/charsets_and_collati=
ons/

Perhaps you can check the ouput for the languages you know.

Regards,
Peter Jacobi