Subject Re: UTF-8 vs UTF-16
Author peter_jacobi.rm
Hi Dimitry,

In, "Dimitry Sibiryakov" wrote:
> If one want to restrict set of characters that can be stored in a
> column CHECK constraints is a good way to do so. May be a new
> functions to check subsets should be introduced: something like
> 'value is cyrillic or latin-1'. Or to keep old dataset syntax but
> only for check purposes.

I agree, that you cannot support all sort of character
repertoire constraints by introducing ever new character
sets, without quickly becoming victim of combinatorial

But it would be nice to have no-control-character charsets.

An UDF to check for occurance of some character classes,
including the UNICODE defined character classes, would be
a very important contribution. Any volunteers?

When the result is to be returned as the bitmask
in a long long, we have 64 classes to come up with.

And checking for (and converting to) the four
UNICODE normalized forms. Ugh, this smells like
serious work.

Peter Jacobi