Subject | Re: [firebird-support] OT: Unicode |
---|---|
Author | Brad Pepers |
Post date | 2005-04-20T07:22:45Z |
Thomas Steinmaurer wrote:
http://www.cl.cam.ac.uk/~mgk25/unicode.html
Basically you have UCS (Universal Character Set) which is ISO standard
10646 and then you have Unicode which was developed by a consortium of
mostly US companies. The two groups realized that two different
universal character sets was not a good idea so they made their
standards compatible and equivalent though they are still published
separately. The Unicode standard has more info on how to draw the
characters and how to handle sorting and comparisons while the ISO is
just a character mapping.
Originally the character set space was setup to be 31 bits though both
standard have agreed to stick to only using 21 bits now. There are lots
of options in encoding these characters into bytes and thats what all
the UCS-2, UCS-4, UTF-8, UTF-16, ... are about. The link above should
explain how at least some of these encodings work and their pro's and con's.
Hope this helps! Here is another link in case the above didn't cover
everything...
http://en.wikipedia.org/wiki/Unicode
--
Brad Pepers
brad@...
> Hi all,I find this a good link to the whole thing:
>
> I'm sorry for abusing this list with that topic, but possibly someone
> has an answer. ;-)
>
> Many DBMS claim to support Unicode and hereby quite a lot of different
> terms appear. UTF-8, UCS16, UCS32, UCS-2 (or is it UCS2?), UTF-16, ...
>
> I do have basic knowledge about Unicode, but I get confused by UCS16,
> UCS2, UTF-16 and so on. Do they mean the same? For example, is UTF-16
> and UCS-2 (UCS2?) the same? Or is one an enhancement of the other?
>
> If anybody knows a good compact online reference on the differences, I
> will owe you a drink at the next Firebird Conference. ;-)
http://www.cl.cam.ac.uk/~mgk25/unicode.html
Basically you have UCS (Universal Character Set) which is ISO standard
10646 and then you have Unicode which was developed by a consortium of
mostly US companies. The two groups realized that two different
universal character sets was not a good idea so they made their
standards compatible and equivalent though they are still published
separately. The Unicode standard has more info on how to draw the
characters and how to handle sorting and comparisons while the ISO is
just a character mapping.
Originally the character set space was setup to be 31 bits though both
standard have agreed to stick to only using 21 bits now. There are lots
of options in encoding these characters into bytes and thats what all
the UCS-2, UCS-4, UTF-8, UTF-16, ... are about. The link above should
explain how at least some of these encodings work and their pro's and con's.
Hope this helps! Here is another link in case the above didn't cover
everything...
http://en.wikipedia.org/wiki/Unicode
--
Brad Pepers
brad@...