Subject | Re: UTF-8 vs UTF-16 |
---|---|
Author | peter_jacobi.rm |
Post date | 2003-08-15T17:56:38Z |
Hi Nickolay,
U+0000..0+FFFF, i.e. forget the astral planes, they
would give us only troubles.
standards.
out of the Surrogate Area U+D800..U+DFFF or a
High-Surrogate followed by a Low-Surrogate
name, due to my sources) is equivalent to the UTF-8 encoding
of the UNICODE subset U+0000..0+FFFF. Otherwise, please
enlighten me on this issue.
In summary, I would it see more usefull do support
more defined subsets of Unicode, than to extend the support
to the astral planes, which would give troubles in a
a lot of other tools and computer languages.
Regards,
Peter Jacobi
> Forgot to ask. What encoding are you really going to implement ?I propose UTF16BE encoding of the UNICODE subset
> Currently engine implements UNICODE_FSS, not UTF8.
> You propose UCS2, not UTF16, right ?
U+0000..0+FFFF, i.e. forget the astral planes, they
would give us only troubles.
> I want to remind you that UTF8 character may occupy 1-6 bytes.I suspect this is clarified to 1-4 bytes in the most recent
standards.
> UTF16 character may occupy 1-3 two-byte words.1 or 2: either a single 16bit word which must not be
out of the Surrogate Area U+D800..U+DFFF or a
High-Surrogate followed by a Low-Surrogate
> Both UNICODE_FSS and USC2 are now obsolete. So fixing MBCS supportI assume UNICODE_FSS (which never quite existed under this
> inside the firebird engine is very important.
name, due to my sources) is equivalent to the UTF-8 encoding
of the UNICODE subset U+0000..0+FFFF. Otherwise, please
enlighten me on this issue.
In summary, I would it see more usefull do support
more defined subsets of Unicode, than to extend the support
to the astral planes, which would give troubles in a
a lot of other tools and computer languages.
Regards,
Peter Jacobi