firebird-architect - Re: UTF-8 vs UTF-16

Subject	Re: UTF-8 vs UTF-16
Author	peter_jacobi.rm
Post date	2003-08-15T17:56:38Z

Hi Nickolay,

> Forgot to ask. What encoding are you really going to implement ?
> Currently engine implements UNICODE_FSS, not UTF8.
> You propose UCS2, not UTF16, right ?

I propose UTF16BE encoding of the UNICODE subset
U+0000..0+FFFF, i.e. forget the astral planes, they
would give us only troubles.

> I want to remind you that UTF8 character may occupy 1-6 bytes.

I suspect this is clarified to 1-4 bytes in the most recent
standards.

> UTF16 character may occupy 1-3 two-byte words.

1 or 2: either a single 16bit word which must not be
out of the Surrogate Area U+D800..U+DFFF or a
High-Surrogate followed by a Low-Surrogate

> Both UNICODE_FSS and USC2 are now obsolete. So fixing MBCS support
> inside the firebird engine is very important.

I assume UNICODE_FSS (which never quite existed under this
name, due to my sources) is equivalent to the UTF-8 encoding
of the UNICODE subset U+0000..0+FFFF. Otherwise, please
enlighten me on this issue.

In summary, I would it see more usefull do support
more defined subsets of Unicode, than to extend the support
to the astral planes, which would give troubles in a
a lot of other tools and computer languages.

Regards,
Peter Jacobi