firebird-architect - Collations (was Re: UTF-8 vs UTF-16)

Subject	Collations (was Re: UTF-8 vs UTF-16)
Author	adem
Post date	2003-08-27T00:16:31Z

Hi Ann,

> > It is not that I dont care (well, maybe it is, I
> >suppose), but an array (or a column in a table called
> >CHARSETS or something) seems to be able to replace the
> >algo you describe below, and give me the freedom and the
> >responsibility to specify *my own* collation order
> >--especially if I am dealing with less than widely
> >known languages.
> >
> Sure, but so does the current arrangement. Firebird
> comes with a dll that supports as many collations as
> we consider important. If some application needs a
> Catalan collation and we don't support it, the
> developers can write their own dll, using the
> predefined interface.

And, apart from the coding and debugging the thing itself,
the problem with the current arrangement is that you have
to give login access to the user to install a DLL (assuming
he/she can compile the thing in the first place). For an
RDBMS that prides itself for not requiring a team of db
admins, one would really like to do without this.

Then again, I have a similar problem with db files;
I can create an employee.fdb at the remote server
thru FB but to remove that file I need to login.

This is something I personally hate to have to do,
I find it incredibly inconsistent, however solid the
counter explanation is. I hope I dont get too many
flames for this :->

> The problem with an entirely "outside the box"
> collation is that it can't be used in an index
> so range requests that are satisfied by an index
> won't match the results of the same request executed
> without an index. That's not good.

I am not sure what you mean by 'entirely outside the
box', the charset information will be in the box;
except that instead of a DLL, it will be in a table,
probably a special table. Would it still not work?

If the charset info was in a table, backups would
be a lot less complicated too. You would take a
backup to another server and it would contain all
the necessary info in it.

> The other advantage of the current approach is that
> the developers of the Catalan collation can, if they
> choose, contribute it to Firebird to be integrated
> into the distributed dll.

Well, so could a .sql file..

> > I have read it. And, ouch! It is a very good example
> >of how the database developer needs to be an expert
> >on linguistics... Is this really fair on the developers?
> >
> Once a collation is done, it's no longer a problem for
> developers. If users of the collation find problems with
> it, then almost by definition, they 're already experts
> and can at least help in creating a new collation or fixing
> the old one.

But why force to mix the two? The 'expert' that came up
with his/her charset table and the 'expert' that will
code it, that is.

From what understand, current collation code has some
form of intelligence to decide what chars come before
what. And, I am sure a lot of hours/days have been spent
on it to do what it does. But, at a time neither the
diskspace, nor RAM, nor the CPU power is as scarce as
it was then, why not sacrifice an infinitesmall amount
of either of them in the name of flexibility and --to
use a worn-out phrase-- freedom to the users? :-)

Ordinary DB users without C/C++ (or whatever) knowledge
could/would easily develop their own charset tables on
some simple GUI and upload it to the server. Then, if
they wish they could send the .sql file over to you to
be a part of the distribution. Why is this less desirable?

Cheers,
Adem