firebird-support - Re: UTF8 and UNICODE

Subject	Re: UTF8 and UNICODE_FSS
Author	woodwardtimothy76
Post date	2008-01-13T22:19:19Z

Thank you Helen and Dmitry for the replies.

> For UTF8, think in terms of collations. If you can find a
> collation that supports the characters and mappings that you want
> to use, either "in the box" or in the ICU databases, then it's
> supported".

The Firebird 2.0 Release Notes mention two collations: UCS_BASIC and
UNICODE.

Are there any others that are widely used?

> UNICODE_FSS stores all characters as exactly 3 bytes, even those
> that are shorter. It can't store characters of 4 or more bytes.

So UNICODE_FSS cannot represent characters outside the basic
multilingual plane?

> The max length of a UTF8 varchar is determined by the number of
> bytes in the *largest* character addressed by the collation, as
> declared in the manifest and to the database.

Sorry for the dumb question, but what is a manifest?

> In this example
> you have a collation which allows for a largest character of 4
> bytes. Since validation happens on byte length, it's possible
> for a string longer than the declared char_length to pass
> validation if the value actually doesn't include many
> characters that large.

I may have misunderstood, but i thought this was one of the issues
addressed when UNICODE_FSS was superseded by UTF8.

For example, consider the following SQL statement, where FIELD1 is of
type VARCHAR(1):

INSERT INTO TABLE1 (FIELD1) VALUES ('xyz')

This raises an exception (as it should) if FIELD1 is UTF8, but not if
FIELD1 is UNICODE_FSS.

Regards,
Timothy Woodward