Subject | Re: Creating GB18030 character set and collation |
---|---|
Author | peter_jacobi.rm |
Post date | 2004-01-16T07:40:27Z |
Hi David,
You've seen http://www-106.ibm.com/developerworks/library/u-china.html
which is a good start of reading?
You've seen http://www-106.ibm.com/developerworks/library/u-china.html
which is a good start of reading?
--- In firebird-support@yahoogroups.com, dhay@l... wrote:
> Just found out that we can ignore the 4 byte characters - so just
have to
> deal with the 1 and 2-byte characters.
Without having really verified this, I assume that GB18030
without the 4-byte sequences is what is known informally
as GBK, an extension of GB2312 which wasn't formally adopted
as a standard.
For differences between GB2312 and GBK/GB18030 check the
mapping tables (or use your own test program linked
against iconv/libiconv) of either:
ICU:
http://oss.software.ibm.com/cvs/icu/charset/data/xml/gb-18030-2000.xml
Mozilla
http://lxr.mozilla.org/seamonkey/source/intl/uconv/ucvcn/gb180304bytes.uf
GNU libc iconv
http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/?cvsroot=glibc
GNU libiconv
http://www.gnu.org/software/libiconv/
After checking licensing issues, linking against libiconv
may be the shortest path to get the conversion part running.
Regards,
Peter Jacobi