firebird-architect - Re: Collations

Subject	Re: Collations
Author	peter_jacobi.rm
Post date	2003-06-18T08:43:29Z

Hi David, all,

Thanks for the additional URL and the
background information!

--- In Firebird-Architect@yahoogroups.com, "David Schnepper" wrote:
> > >From a *very* first reading of the material offered, I have
> > the impression that Firebird implements the full glory of
> > UNICODE collating.
>
> Unless someone did something in Firebird while I was
> out to lunch (ok, so I've been out to lunch for about
> a year...) I think Firebird still follows the InterBase
> collation model.

So we must attribute this to a fine case of not
folling the Not-Invented-Here approach: UNICODE
collation seems to base on previous work done in SQL
(or both base on an even older successor?).

Anyway, Firebird collation architecture allows UNICODE
conformant collating as can be checked by comparing with
http://www.unicode.org/reports/tr10/

> Most of the i18n work was done in 1991/1992. Back then
> Unicode was new (heck, Java JDK 1.0 didn't ship until
> 1995!!!). No collations were defined for the Unicode_FSS
> character set as it was unclear what the appropriate
> model would be (and, of course, lack of time before
> ship to implement...).

I see. In ye olde days, UNICODE was just another character set
and a complicated one (not as complicated as the ISO 2022
nightmare - thank all gods that this beast is almost dead).

The view that all 'legacy' data sets are just subsets of UNICODE
is of more modern origin. I've heard that this will also be
stated in SQL:200n.

So, at some point in the future, Firebird may make use of
this fact, but in the mean time it can be used in the
implementation of collations without changing the interface.

> Why aren't the 8859-1 drivers implemented for cp1252?
> Well, 8859-1 *IS* a different character set than cp1252.
> (cp1252 defines about 12 characters in positions that
> 8859-1 specifically defines as "not a character").

Yes, but nethertheless I would like to have the
<language>_<country> collations for the cp12<nn> and
UNICODE_FSS character sets. I will give it a try.

Regards,
Peter Jacobi