Subject Re: [firebird-support] How to change charset of a DB ?
Author Helen Borrie
At 02:22 AM 23/10/2010, you wrote:
>Hello all,
>
>After a successful upgrade from 2.1 to 2.5 I realized that I was not too
>carefull about the character settings so far.

..a crocodile waiting for its chance to bite..

>I want now to ensure that UTF-8 is in place in the whole DB.

A database has the default character set NONE unless you create the database with the desired default character set - see the language docs and updates at the DocIndex page.

>1) adapt db settings: alter character set UTF-8;

This would not do anything to existing data or metadata.

>2) Check all char/vchar fields: most of them have CHARACTER SET NONE

You can discover this from a metadata extraction tool - isql has one; most third-party toolsets have better ones.

>COLLATE NONE or even no character set definition on field level at all

The default collation is always the binary one. Charset NONE has no other collation available. It knows only how to uppercase the 7-bit characters; everything beyond dec 128 is "lost souls".

>3) Check the ODBC settings to use UTF-8 as well

Ask about that on the ODBC-devel list. Each client interface has its own way to pass connection parameters.

>Questions:
>a) In order ensure that all existing char/varchar fields are storing the
>data in UTF-8 do need to export the whole DB and import structure and
>data again?

Yes. Basically, get a metadata extract, edit it to reflect the new requirements and then pump the data across from the existing database.

>b) Do I need to adapt a char/varchar fields and set them to empty, NONE
>or UTF-8 ?

No. If you intend for all character fields to be UTF-8 then your datapump will take care of it. If you require a particular collation then put it into the column definition when you edit the metadata script.

>c) What happens if a client does not define a charset or defines a
>different one than UTF-8 ?

Strange things can happen, especially in the lower sub-releases of v.2.1.x...although the Intl machine from 2.1.3 onward is pretty good at telling you about mismatches and blocking them with exceptions. Set your client to the charset that is most compatible with the input data and Intl will do the right thing regarding storage and retrieval.

If you have a need to store certain columns in a different charset, that is incompatible with the client environment, then it gets more tricky - use of casting or introducer syntax in the client application may become necessary. It would normally make more sense to store the data in UTF-8 and apply a specific collation to that column.

Ask in firebird-tools about data pumping utilities you might experiment with.

./heLen