firebird-architect - Re: [Firebird-Architect] UTF-8 and Compression

Subject	Re: [Firebird-Architect] UTF-8 and Compression
Author	Jim Starkey
Post date	2005-03-01T01:16:23Z

Leyne, Sean wrote:

>Jim,
>
>
>
>>I've been thinking about compression and Olivier's stunning suggestion
>>to switching the engine to all utf-8.
>>
>>
>
>Stunning good or stunning bad?
>
>
>
>

Stunningly brilliant (i.e. better than good). We currently have an
n-squared problem of character sets confounded by collating sequences.
Switching to a universal internal representation reduces it to a problem
linear with the number of collating sequences. It breaks the binding
between character sets and collations. A character set becomes a
bidirectional mapping between the character set and UTF-8. A collation
becomes a simple object that compares two UTF-8 strings, generates a
key, upcases, and downcases (what have I neglected here?). In the new
API we can probably isolate character set conversions to the client,
leaving the engine with a single internal representation and collation
sequences. The legacy API will need to support per-SQLVAR character
sets, but the "new API" can probably get away with pure UTF-8. New
layered APIs, the formalization of IscDbc, can be defined with a single
per-session locale, which fits the Java model and simple sanity nicely.

[Non-text portions of this message have been removed]