Subject | RE: [Firebird-Architect] Re: UTF-8 vs UTF-16 |
---|---|
Author | David Schnepper |
Post date | 2003-08-24T18:09:57Z |
Peter wrote:
Although UCS-4 and UTF-16 provide comprehensive ways to represent several
character sets, they do not preserve the byte values for ASCII characters.
Because all UNIX systems are based on an ASCII kernel, they reserve certain
character codes for I/O operations, such as the null character as a string
terminator, the slash (/) character as a path name separator, and the DEL
and SPACE control characters. To circumvent this problem, another version of
UTF was devised, called FSS-UTF (File System Safe-UTF), now commonly known
as UTF-8.
At the time (1992) I implemented Unicode-FSS (which was later known as
FSS-UTF).
There were other encoding proposals floating around, but I liked FSS as
a) No embedded 0 bytes, except for real EOS.
b) Anything that "looked like" a file system character ( : / . a-z, etc)
really was a file system character.
I agree that noone else picked up on the name UNICODE_FSS - I made it up and
noone else agreed with my wisdom. <grin>
Dave
>here's a quote from http://docs.sun.com/db/doc/806-5584/6jej8rb0l?a=view
> I assume UNICODE_FSS (which never quite existed under this
> name, due to my sources) is equivalent to the UTF-8 encoding
> of the UNICODE subset U+0000..0+FFFF. Otherwise, please
> enlighten me on this issue.
>
Although UCS-4 and UTF-16 provide comprehensive ways to represent several
character sets, they do not preserve the byte values for ASCII characters.
Because all UNIX systems are based on an ASCII kernel, they reserve certain
character codes for I/O operations, such as the null character as a string
terminator, the slash (/) character as a path name separator, and the DEL
and SPACE control characters. To circumvent this problem, another version of
UTF was devised, called FSS-UTF (File System Safe-UTF), now commonly known
as UTF-8.
At the time (1992) I implemented Unicode-FSS (which was later known as
FSS-UTF).
There were other encoding proposals floating around, but I liked FSS as
a) No embedded 0 bytes, except for real EOS.
b) Anything that "looked like" a file system character ( : / . a-z, etc)
really was a file system character.
I agree that noone else picked up on the name UNICODE_FSS - I made it up and
noone else agreed with my wisdom. <grin>
Dave