Subject Re: [Firebird-Architect] UTF-8 and Compression
Author Arno Brinkman
Hi,

> Stunningly brilliant (i.e. better than good). We currently have an
> n-squared problem of character sets confounded by collating sequences.
> Switching to a universal internal representation reduces it to a problem
> linear with the number of collating sequences. It breaks the binding
> between character sets and collations. A character set becomes a
> bidirectional mapping between the character set and UTF-8. A collation
> becomes a simple object that compares two UTF-8 strings, generates a
> key, upcases, and downcases (what have I neglected here?). In the new
> API we can probably isolate character set conversions to the client,
> leaving the engine with a single internal representation and collation
> sequences. The legacy API will need to support per-SQLVAR character
> sets, but the "new API" can probably get away with pure UTF-8. New
> layered APIs, the formalization of IscDbc, can be defined with a single
> per-session locale, which fits the Java model and simple sanity nicely.

How about indexes (keys) that need to hold the exact collation you defined
(case-insensitive / equal-characters in weight)? It is importand to store the
key in the exact character repetoire or did i miss something in the proposal?

Regards,
Arno Brinkman
ABVisie

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Firebird open source database (based on IB-OE) with many SQL-99 features :
http://www.firebirdsql.org
http://www.firebirdsql.info
http://www.fingerbird.de/
http://www.comunidade-firebird.org/

Support list for Interbase and Firebird users :
firebird-support@yahoogroups.com

Nederlandse firebird nieuwsgroep :
news://newsgroups.firebirdsql.info