firebird-support - Re: confused with charset and collation

Subject	Re: confused with charset and collation
Author	peter_jacobi.rm
Post date	2004-06-07T07:08:56Z

Hi Didier!

"Didier Gasser-Morlay" <Didiergm@n...> wrote:

> B) see my questions inline

I'll try to clarify

> > In addition I can recommended reading the Unicode and
> > Dave's documentation about multi level collation.
> >
> Where can I find it ? I only find a direct link to ibcollate.

Dave's doc is included in the Collation "SDK":
http://www.brookstonesystems.com/CollateKit.zip

> > and rewrite the query to "BETWEEN "cafe" AND "cafezzz".
> [didier] with that multi-level collation it looks like the query must
> be run in lower case isn't it ?

You must get the lowest valued variation as lower bound.
This is almost always the lower case version. You can also
'decrement' the base string and append 'zzz':
BETWEEN 'Rhôndzzz' AND 'Rhônezzz' will match everything that
starts with 'Rhône' in all casing and accenting variations.
(Assumed that no real string in your DB has 'zzz' inside, which
would lead to some additional false positives)

> > Both options were discussed in earlier threads.
> [Didier] What do you call a multi level collation, I could not find
> any ref when searching the group.

In addition to Dave's doc, look here:
http://www.unicode.org/reports/tr10/

It's the process of splitting the string comparison
in multiple phases, so that any difference in 'base character'
is considered a stronger difference than any difference
in accents, which itself is stronger than any difference
in casing. Equivalently the string to be sorted is decomposed
into these components:

Rhône => RHONE-00300-10000

> Re the non-standard nocase noaccent, I suppose you make a ref to
> dave's work at brookstonesystems. I seems that it does not work in fb
> 1.5 nor on Linux. Both are showstoppers to me as even the construction
> kit says it does not work with 1.5.

It is told at the campfires, that Dave's collation can be
made to work with FB1.5 (on Win32):
- copy (not rename) his gdsintl2.dll to fbintl2.dll
- copy (not rename) FB's fbintl.dll to gdsintl.dll

For Linux you must ask himself.

At least one brave user tried the nocase-noaccent collation
'LOADABLE' from my demo kit:
http://www.jodelpeter.de/i18n/fbarch/

With minor tweaks it should compile under Linux.

> [didier] with my, hopefully yet, limited understanding I'd say that
> ISO-8859-1 is enough

This has the widest range of culturally correct collations.
Just select the one matching your largest userbase or
customize on install.

Regards,
Peter Jacobi