Subject Re: [firebird-support] Writing UTF16 to the database
Author David Johnson
On Tue, 2005-02-22 at 12:10, Brad Pepers wrote:
> 4. Comparison and collation with Unicode is not a trivial problem to
> solve. I'm not sure if there is any code out there to already do part
> of this but its quite complicated though with potential benefits if its
> done right such as having case-less comparisons and being able to handle
> any mix of character sets properly.

A look at other products that have already resolved this problem might
be in order.

Java uses UTF-16 internally, and has resolved the collation and
comparison problems with the Locale class. Locale defines language and
cultural norms, including number place delimiters, collation sequence,
and currency symbol, so that the application code does not need to be
sensitive to these factors.

The Locale instance for ES-US (Spanish US) provides appropriate
comparison and collation methods for Spanish speakers in the US, which
differ from ES-MX, ES-SP, and ES with no country identifier.

In terms of the Vulcan engine, a CLocale super-class that defines the
abstract or (default behaviors - binary comparison?), with subclasses
for each of the supported language/collation mechanisms. Like the Java
mechanism, specific locales can be addressed via a static method call
that uses reflexion or dynamic library registration (.dll or .so) to
isolate the localization code from the core engine code.

The locale dependent methods that I can see immediately (without
thinking of it) that would be required, are:

int compare (UTF8_String, UTF8_String) // returns -1, 0 or 1
UTF8_Char placeSeparator ()
UTF8_Char fractionSeparator ()

The sort method in the database engine would then become

resultSet Sort (resultSet rs, Locale loc)
{
// preferred sort algorithm with ...
testResult = loc.compare (rs.string1, rs.string2);
switch (testResult)
{
case 0: ; // some action on equals
case 1: ; // some action on greater
case -1:; // some action on less
}
}