| Subject | Re: [Firebird-Architect] Re: UTF-8 vs UTF-16 | 
|---|---|
| Author | Nickolay Samofatov | 
| Post date | 2003-08-15T18:31:19Z | 
Hello, Peter !
This is mostly theoretical answer. Practical part will be in the next
letter.
and thus start to support unusual codepoints. But Unicode standard
doesn't require to implement support for all defined codepoints
anyway.
this subject. Look at RFC 2044 for example:
http://www.ietf.org/rfc/rfc2044.txt
prefix was dropped.
FSS/UTF == File System Safe UCS Transformation Format.
But at the times this name was used in Firebird all codepoints were
16 bits.
            This is mostly theoretical answer. Practical part will be in the next
letter.
>> Forgot to ask. What encoding are you really going to implement ?Various OS's and products move to full Unicode standard conformance
>> Currently engine implements UNICODE_FSS, not UTF8.
>> You propose UCS2, not UTF16, right ?
> I propose UTF16BE encoding of the UNICODE subset
> U+0000..0+FFFF, i.e. forget the astral planes, they
> would give us only troubles.
and thus start to support unusual codepoints. But Unicode standard
doesn't require to implement support for all defined codepoints
anyway.
>> I want to remind you that UTF8 character may occupy 1-6 bytes.True. In recent versions they clarified this.
> I suspect this is clarified to 1-4 bytes in the most recent
> standards.
>> UTF16 character may occupy 1-3 two-byte words.True. My memories come from several years ago when I researched
> 1 or 2: either a single 16bit word which must not be
> out of the Surrogate Area U+D800..U+DFFF or a
> High-Surrogate followed by a Low-Surrogate
this subject. Look at RFC 2044 for example:
http://www.ietf.org/rfc/rfc2044.txt
>> Both UNICODE_FSS and USC2 are now obsolete. So fixing MBCS supportIt existed under name Unicode FSS/UTF (around 1994) and later FSS
>> inside the firebird engine is very important.
> I assume UNICODE_FSS (which never quite existed under this
> name, due to my sources) is equivalent to the UTF-8 encoding
> of the UNICODE subset U+0000..0+FFFF. Otherwise, please
> enlighten me on this issue.
prefix was dropped.
FSS/UTF == File System Safe UCS Transformation Format.
But at the times this name was used in Firebird all codepoints were
16 bits.
> In summary, I would it see more usefull do supportThere should be no troubles if we fix MBCS support in the engine.
> more defined subsets of Unicode, than to extend the support
> to the astral planes, which would give troubles in a
> a lot of other tools and computer languages.
> Peter JacobiNickolay Samofatov