Subject Re: [Firebird-Architect] UTF-8 over UTF-16 WAS: Applications of Encoded Data Streams
Author Jim Starkey
Svend Meyland Nicolaisen wrote:

>Yes the Unicode Collation standard has a very good and detailed description
>of how to implement collations for the entire Unicode character set for
>different locales. One of the problems with Unicode collations is that the
>sort key needs about four bytes per character (for some special characters
>even more), even for strings that can be compressed using UTF-8 or UTF-16.
>This means that a limit on the maximum size of indices quickly can become a
>problem. I understand that FireBird 2 has an index size limit of 25% of the
>used page size. It is much better than the limit set by InterBase (256
>charcters?) but might no be good enough for Unicode sort keys.
Collations and encoding are separate problems except, perhaps, for the
code that implements the collation. The fact that the Unicode guys has
a universal, world wide collation doesn't mean we have to use it.
People seem quite happy with character set specific collations, but if
somebody wanted to implement the universal collation, that would work, too.


Jim Starkey
Netfrastructure, Inc.
978 526-1376