firebird-architect - RE: [Firebird-Architect] Re: Collations

Subject	RE: [Firebird-Architect] Re: Collations
Author	David Schnepper
Post date	2003-06-17T16:30:32Z

As the original author of Interbase's collation
and internationalization, I'll put my 2 cents
in.

First off, more info on building collations is
at: http://www.brookstonesystems.com under
the Interbase Collation kit.

>
> And back to the topic:
> >From a *very* first reading of the material offered, I have
> the impression that Firebird implements the full glory of
> UNICODE collating. So it's even more astonishing to see that
> the collating is linked to the character encoding. Why would you
> have some collating available for 8859-1 but not for cp1252
> and vice versa?

Unless someone did something in Firebird while I was
out to lunch (ok, so I've been out to lunch for about
a year...) I think Firebird still follows the InterBase
collation model.

The InterBase model (leaving the question of Firebird open...)
followed the SQL 92 definition of collation, character set,
character repertoire, etc.

In the SQL 92 model, every collation is bound to a
character set, NOT a character repertoire. In more
practical terms, this means a collation is bound to
an implementation of characters, not a set of characters
(regardless of how the set is implemented).

Of course, internally you could implement the collation
with Unicode, and thus have the "same implementation"
work for multiple character sets, with just a remapping
layer. But that isn't the current implementation.

Most of the i18n work was done in 1991/1992. Back then
Unicode was new (heck, Java JDK 1.0 didn't ship until
1995!!!). No collations were defined for the Unicode_FSS
character set as it was unclear what the appropriate
model would be (and, of course, lack of time before
ship to implement...).

The key requirement for InterBase in 1991/92 was compatibility
with Borland's dBASE and Paradox database products. (Remember
them?) So all the effort went into making drivers that
*exactly* emulated the dBASE and Paradox collation orders
(including all the known bugs in those collation orders...)

New drivers, implementing the best possible collation, were
implemented for ISO 8859-1.

As for why some drivers exist for cp1252/Dos and not ISO 8859
-- These are generally dBASE/Paradox collation orders,
which are buggy in many cases, and were not going
to "go forward".

Why aren't the 8859-1 drivers implemented for cp1252?
Well, 8859-1 *IS* a different character set than cp1252.
(cp1252 defines about 12 characters in positions that
8859-1 specifically defines as "not a character").

Basically, the reason there's such a limited set of
drivers was "ran outta time" for the release of
InterBase 4.0 in 1992. After that, it was never
a priority from Marketing to improve collation/character
set, etc. Which is why very little work was done
in that area from 1992 to 1999 (when I left Borland/Interbase).

Dave

>
> Regards,
> Peter Jacobi
>
>
>
> To unsubscribe from this group, send an email to:
> Firebird-Architect-unsubscribe@yahoogroups.com
>
>
>
> Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
>
>
>
>