Subject The case of uppercasing (+charset tailoring) [long]
Author peter_jacobi.rm
Dear All,

An often encountered surprise when using
Firebird's character sets, is the uppercasing
behaviour.

The default (codepoint order) collations only
uppercase a-z and leave all other codepoints
invariant. By desperate user needs this was
already changed for some cyrillic charsets.

I'm not an expert in Interbase history, but I
assume this problem did originate by overly
pessimistic thinking about culture correct
uppercasing. "As uppercasing is culture-dependant",
the reasoning may go, "we can do nothing sensible
in the default collation".

But this is totally out of proportion, as
uppercasing is only to a tiny percentage culture
dependant, and you are a true Unicode expert if
you can name another area of dependencies besides
the relatively well known Turkish dotless I case.

So, in my (in this case not so humble) opinion, this
behaviour must go and uppercasing must become default
Unicode uppercasing for all character sets (plus minor
tailoring for Turkish).

The question is about the mode and schedule of deployment.

By changing the uppercasing behaviour, existing databases
may break (but the change for cyrillic did go through
without major requests to kill somebody for it).

So, are we doomed to leave default collation uppercasing
as-is, or should it just be corrected, hoping for the best?

Is there a "dialect 4" in sight? Should it be done alongside
with a totally new ODS only?

Touching a vaguely related point, the problem with the
default collation, is not only that it's the default, it's
also the only one that can be easily specified for a
complete database, as the db's default character set can
be set, but not the the db's default collation.

This gap can be closed by the SQL standard's (otherwise
rather useless) feature to dynmically create new characacter
sets, as you can create a new character set having the desired
collation as default.

Are there opinions on this idea?

Regards,
Peter Jacobi