Subject Character sets and collations of columns
Author Tomasz Tyrakowski

One of the legacy databases I happen to maintain has character sets and
collations messed up. As far as I know, the database was created with
default character set "NONE" instead of "WIN1250", which should have
been set back then (and it's over 10 years old).
Then, some columns have been added with the character set option set
explicitly to WIN1250 (alter table X add Y varchar(n) character set
WIN1250), while the old columns still remained as "NONE".
The clients were windows apps and used WIN1250 when connecting to the
Later on, another maintainer changed the RDB$COLLATION_ID flags for some
of the columns in RDB$RELATION_FIELDS to indicate WIN1250 (value 3 as
far as I know, although I'm not entirely sure if this flag actually
concerns character sets, or only collation order - maybe the maintainer
was convinced he was clever, while in fact nothing relevant happened),
and set the default character set of the database to WIN1250 (in a way
unknown to me - maybe even via binary editing the GDB file and changing
some byte-flags inside).
So now the DB is kind of a mess (despite reporting dutifully WIN1250 as
the default character set), with some columns created when the DB had
"NONE" character set, some other create with character set WIN1250 set
explicitly upon creation, yet another created without explicit character
set but when the database has already had the "WIN1250" default
character set, and possibly some columns, for which the RDB$COLLATION_ID
was set by hand.
So my question is (in fact I'm not even sure what the right question is
:( ): assuming the strings in the database are in fact WIN1250-encoded
(all the clients used WIN1250), is setting the RDB$COLLATION_ID on all
text columns a good idea and would actually make any difference?
All strange quirks that happen from time to time with this database are
related to national characters. Sometimes a sort doesn't work as
expected, sometimes a client function (like strtoupper in PHP) doesn't
work on data from some of the columns (but works for other data). Is
there a chance that setting the collation flag might help? Or is there
another way to set it right?
I know the "right" right thing to do (pun intended) is to create a new
DB and pump the data, but it has about 400 interconnected tables and
over 60GB of size, so if there is another way, I would be grateful to
hear about it first.

Thanks in advance and sorry for this lengthy story.


__--== Tomasz Tyrakowski ==--__
__--== SOL-SYSTEM ==--__
__--== ==--__