Subject Re: Creating GB18030 character set and collation
Author peter_jacobi.rm
Hi David,

You've seen http://www-106.ibm.com/developerworks/library/u-china.html
which is a good start of reading?

--- In firebird-support@yahoogroups.com, dhay@l... wrote:
> Just found out that we can ignore the 4 byte characters - so just
have to
> deal with the 1 and 2-byte characters.

Without having really verified this, I assume that GB18030
without the 4-byte sequences is what is known informally
as GBK, an extension of GB2312 which wasn't formally adopted
as a standard.

For differences between GB2312 and GBK/GB18030 check the
mapping tables (or use your own test program linked
against iconv/libiconv) of either:

ICU:
http://oss.software.ibm.com/cvs/icu/charset/data/xml/gb-18030-2000.xml

Mozilla
http://lxr.mozilla.org/seamonkey/source/intl/uconv/ucvcn/gb180304bytes.uf

GNU libc iconv
http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/?cvsroot=glibc

GNU libiconv
http://www.gnu.org/software/libiconv/

After checking licensing issues, linking against libiconv
may be the shortest path to get the conversion part running.

Regards,
Peter Jacobi