Subject Re: UTF-8 (various)
Author David Johnson
My contribution to this debate will show my Java bias immediately ...
:o)

With UTF-8 (and other unicode solutions), localization (collation,
comparison, and capitalization rule set) is both explicitly and
implicitly separate from the character encoding.

SQL-2003 syntax includes the COLLATION modifier for indeces and sorts.

In engine implementation, a Locale object will exist for every collation
that is supported. Subclasses of Locale will implement localized rules
for case conversion and string comparison.

int Locale.compare (String _value1, String _value2);
String Locale.uppercase (String _value);
String Locale.lowercase (String _value);

and of course, getting the correct locale object ...

Locale Locale.getLocale (String _ISOLocaleCode);

Using .dll and .so techniques, the locale classes can actually be
external to the engine.

Here is where I show off my ignorance of Firebird's innards ...

Given the SQL

select F1, F2, F3 from T1
order by F1 COLLATION ESMX

The engine performs the initial result set selection as always, then
applies the sort before returning data to the client application. The
sort code differs from the extant code in two places:

1. At the start of the sort, the sort method calls Locale.getLocale with
the literal 'ESMX' to ensure that the correct localization rules are
applied

Locale sortLocale = Locale.getLocale (SortLocale);

2. The extant call to the current string comparison mechanism is
replaced with a call to sortLocale.compare

compareResult = sortLocale.compare (S1, S2);
if (compareResult == 0) ...
else if (compareResult < 0) ...
else if (compareResult > 0) ...