firebird-architect - Collations (was Re: UTF-8 vs UTF-16)

Subject	Collations (was Re: UTF-8 vs UTF-16)
Author	peter_jacobi.rm
Post date	2003-08-26T15:53:18Z

Hi Adem,

--- In Firebird-Architect@yahoogroups.com, "adem" write:

> It is not that I dont care (well, maybe it is, I
> suppose), but an array (or a column in a table called
> CHARSETS or something) seems to be able to replace the
> algo you describe below, and give me the freedom and the
> responsibility to specify *my own* collation order
> --especially if I am dealing with less than widely
> known languages.
>
> And, since it would be a simple lookup array, it stands
> a good chance that it will be faster.

Dare to have a peek in the sources?
This is the source used to implement the french collation:

http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/firebird/firebird2/src/intl/collations/bl88591fr0.h?rev=1.3&content-type=text/vnd.viewcvs-markup

(this must be all re-arranged in one long line, if
some advanced technology managed to break it into
several)

Things to learn:

1. It's four tables, not one. Carefully packed together
using bitfields for efficency

2. I can write you a version of fbintl2.dll which
reads an external file to fill these tables. Next week
if you can pay, will take a little longer when doing it
in my free time.

3. There isn't any speed advantage to gain here. This
is already carefully crafted code.

4. Handwritten collation tables from everybody would
be a support problem.

> > http://groups.yahoo.com/group/Firebird-Architect/message/4828
>
> I have read it. And, ouch! It is a very good example
> of how the database developer needs to be an expert
> on linguistics... Is this really fair on the developers?

Of course. If you aren't interested in collation details,
accept the way the server does it as authorative.

Using Win32 LCMapString to compare strings using your
default or choosen locale works the same. So does
Collator::compate in Java.

Where's your problem?

All these localized comparisons may be very difficult
internally, but they all guarantee:
1. Exactly one of these is true:
s1 < s2
s1 > s2
s1 == s2
2. s1 < s2 <=> s2 > s1
3. s1 < s2 and s2 < s3 => s1 < s3

So they will all behave sane and you need not to care
about the details.

Regards,
Peter Jacobi