firebird-support - Re: [firebird-support] Character sets and collations of columns

Subject	Re: [firebird-support] Character sets and collations of columns
Author	robert rottermann
Post date	2016-08-26T12:35:49Z

I have no idea at on how to fix it "correctly", but would it not be an
idea just to dump the tables with weird character into a text file, and
search replace them.

The wrong chars are probably uniformly wrong, so you should be able to
fix them using sed or some such tool (on linux, I am not well versed
with windows)

afterwards you clean up the db setting (droping it?) and the load the db
again

robert

On 26.08.2016 13:58, Tomasz Tyrakowski t.tyrakowski@...
[firebird-support] wrote:

> Hi,
>
> One of the legacy databases I happen to maintain has character sets and
> collations messed up. As far as I know, the database was created with
> default character set "NONE" instead of "WIN1250", which should have
> been set back then (and it's over 10 years old).
> Then, some columns have been added with the character set option set
> explicitly to WIN1250 (alter table X add Y varchar(n) character set
> WIN1250), while the old columns still remained as "NONE".
> The clients were windows apps and used WIN1250 when connecting to the
> database.
> Later on, another maintainer changed the RDB$COLLATION_ID flags for some
> of the columns in RDB$RELATION_FIELDS to indicate WIN1250 (value 3 as
> far as I know, although I'm not entirely sure if this flag actually
> concerns character sets, or only collation order - maybe the maintainer
> was convinced he was clever, while in fact nothing relevant happened),
> and set the default character set of the database to WIN1250 (in a way
> unknown to me - maybe even via binary editing the GDB file and changing
> some byte-flags inside).
> So now the DB is kind of a mess (despite reporting dutifully WIN1250 as
> the default character set), with some columns created when the DB had
> "NONE" character set, some other create with character set WIN1250 set
> explicitly upon creation, yet another created without explicit character
> set but when the database has already had the "WIN1250" default
> character set, and possibly some columns, for which the RDB$COLLATION_ID
> was set by hand.
> So my question is (in fact I'm not even sure what the right question is
> :( ): assuming the strings in the database are in fact WIN1250-encoded
> (all the clients used WIN1250), is setting the RDB$COLLATION_ID on all
> text columns a good idea and would actually make any difference?
> All strange quirks that happen from time to time with this database are
> related to national characters. Sometimes a sort doesn't work as
> expected, sometimes a client function (like strtoupper in PHP) doesn't
> work on data from some of the columns (but works for other data). Is
> there a chance that setting the collation flag might help? Or is there
> another way to set it right?
> I know the "right" right thing to do (pun intended) is to create a new
> DB and pump the data, but it has about 400 interconnected tables and
> over 60GB of size, so if there is another way, I would be grateful to
> hear about it first.
>
> Thanks in advance and sorry for this lengthy story.
>
> regards
> Tomasz
>