firebird-support - COLLATE usage for blob and varchar fields on an UTF8 database

Subject	COLLATE usage for blob and varchar fields on an UTF8 database
Author	patrick_marten
Post date	2012-07-02T13:31:50Z

Hello,

currently I have a database with ISO8859_1 as character set. This character set applies to most fields, but some have a different one, so that a table looks like this:

CREATE TABLE MY_TABLE
(
...,
DESCRIPTION1 BLOB SUB_TYPE 1 SEGMENT SIZE 80,
DESCRIPTION2 BLOB SUB_TYPE 1 SEGMENT SIZE 80 CHARACTER SET ISO8859_5,
SHORT_DESCRIPTION1 VARCHAR( 100),
SHORT_DESCRIPTION2 VARCHAR( 100) CHARACTER SET ISO8859_5,
SOME_TEXTFIELD VARCHAR( 50) COLLATE ISO8859_1,
...;
);

We are planning to convert everything to unicode and have created a database script + a database with UTF8 as character set, so that we can continue with development and then see, how to convert existing customer databases etc. (this will probably lead to a separate topic here with a lot of questions ;) )

Unfortunately I don't remember how we had created our current testing-database, so that a few things need to be clarified.

Another step will be, to change some blob fields from SUB_TYPE 1 to SUB_TYPE 0, because the component we will be using on the UTF8 version is working with a binary format for descriptions etc., so that the similar part of the new script (UTF8 database) looks like this:

CREATE TABLE MY_TABLE
(
...,
DESCRIPTION1 BLOB SUB_TYPE 0 SEGMENT SIZE 80 COLLATE UNICODE,
DESCRIPTION2 BLOB SUB_TYPE 0 SEGMENT SIZE 80 COLLATE UNICODE,
SHORT_DESCRIPTION1 VARCHAR( 100) COLLATE UNICODE,
SHORT_DESCRIPTION2 VARCHAR( 100) COLLATE UNICODE,
SOME_TEXTFIELD VARCHAR( 50) COLLATE UNICODE,
...;
);

IBExpert shows me current UTF8 table as follows:

Field Name | Field Type | Size | Subtype | Charset | Collate |
-------------------------------------------------------------
DESCRIPTION1 | BLOB | 80 | Binary | UTF8 | UTF8 (*) |
DESCRIPTION2 | BLOB | 80 | Binary | UTF8 | UTF8 (*) |
SHORT_DESCRIPTION1 | VARCHAR | 100 | | UTF8 | UNICODE |
SHORT_DESCRIPTION1 | VARCHAR | 100 | | UTF8 | UNICODE |
SOME_TEXTFIELD | VARCHAR | 50 | | UTF8 | UNICODE |

Collate for the first two fields is different in IBExpert than in the script.

I've needed to add some fields to the table and tried this, based on the UTF8 script:

alter table MY_TABLE add ANOTHER_DESCRIPTION BLOB SUB_TYPE 0 SEGMENT SIZE 80 COLLATE UNICODE;

but this fails with an error like "Invalid datatype. Incorrect usage of Character set or Collate".

In order to work it had to be:

alter table MY_TABLE add ANOTHER_DESCRIPTION BLOB SUB_TYPE 0 SEGMENT SIZE 80;

So the first question is, how is it possible, that IBExpert shows me UTF8 for the "Collate" column (see above at the (*) symbol).

Because of the error with alter table, I thought that I must have created the table without collate and just forgott to edit the script.
I thought, that IBXpert automatically shows UTF8 as character set and collate in such a case, i.e. if nothing was specified for those fields.

But for the newely added field "ANOTHER_DESCRIPTION" IBExpert shows this:
Field Name | Field Type | Size | Subtype | Charset | Collate |
-------------------------------------------------------------
DESCRIPTION1 | BLOB | 80 | Binary | | |

So now I'm somehwat confused and am not sure, what is correct and why there are such differences and how to use collate properly in that case.

Please put some light into the dark...

Kind regards,
Patrick