Subject Re: UTF-8 vs UTF-16
Author peter_jacobi.rm
Hi Dimitry,

In Firebird-Architect@yahoogroups.com, "Dimitry Sibiryakov" wrote:
> If one want to restrict set of characters that can be stored in a
> column CHECK constraints is a good way to do so. May be a new
> functions to check subsets should be introduced: something like
> 'value is cyrillic or latin-1'. Or to keep old dataset syntax but
> only for check purposes.

I agree, that you cannot support all sort of character
repertoire constraints by introducing ever new character
sets, without quickly becoming victim of combinatorial
explosion.

But it would be nice to have no-control-character charsets.

An UDF to check for occurance of some character classes,
including the UNICODE defined character classes, would be
a very important contribution. Any volunteers?

When the result is to be returned as the bitmask
in a long long, we have 64 classes to come up with.

And checking for (and converting to) the four
UNICODE normalized forms. Ugh, this smells like
serious work.

Regards,
Peter Jacobi