Subject Re: UTF8 and UNICODE_FSS
Author woodwardtimothy76
Thank you Helen and Dmitry for the replies.


> For UTF8, think in terms of collations. If you can find a
> collation that supports the characters and mappings that you want
> to use, either "in the box" or in the ICU databases, then it's
> supported".

The Firebird 2.0 Release Notes mention two collations: UCS_BASIC and
UNICODE.

Are there any others that are widely used?


> UNICODE_FSS stores all characters as exactly 3 bytes, even those
> that are shorter. It can't store characters of 4 or more bytes.

So UNICODE_FSS cannot represent characters outside the basic
multilingual plane?


> The max length of a UTF8 varchar is determined by the number of
> bytes in the *largest* character addressed by the collation, as
> declared in the manifest and to the database.

Sorry for the dumb question, but what is a manifest?


> In this example
> you have a collation which allows for a largest character of 4
> bytes. Since validation happens on byte length, it's possible
> for a string longer than the declared char_length to pass
> validation if the value actually doesn't include many
> characters that large.

I may have misunderstood, but i thought this was one of the issues
addressed when UNICODE_FSS was superseded by UTF8.

For example, consider the following SQL statement, where FIELD1 is of
type VARCHAR(1):

INSERT INTO TABLE1 (FIELD1) VALUES ('xyz')

This raises an exception (as it should) if FIELD1 is UTF8, but not if
FIELD1 is UNICODE_FSS.


Regards,
Timothy Woodward