Subject Re: Creating GB18030 character set and collation
Author peter_jacobi.rm
Hi David,

You've seen
which is a good start of reading?

> Just found out that we can ignore the 4 byte characters - so just
have to
> deal with the 1 and 2-byte characters.

Without having really verified this, I assume that GB18030
without the 4-byte sequences is what is known informally
as GBK, an extension of GB2312 which wasn't formally adopted
as a standard.

For differences between GB2312 and GBK/GB18030 check the
mapping tables (or use your own test program linked
against iconv/libiconv) of either:



GNU libc iconv

GNU libiconv

After checking licensing issues, linking against libiconv
may be the shortest path to get the conversion part running.

Peter Jacobi