Subject | Re: Unicode Support |
---|---|
Author | peter_jacobi.rm |
Post date | 2003-10-08T14:55:55Z |
Hi David,
--- "hay77772000" <dhay@l...> wrote:
But let me think about the possible UNICODE gotchas:
1. Whereas FB internally uses UNICODE wide chars a lot (as
intermediary step in charconv), the database store UTF-8
on disk and direct API programming will most likely use
UTF-8 too. This is somewhat uncommon on Win32, but not
a real problem.
Getting a middle layer involved (JDBC, ODBC or .NET) will give
you the choice of a wide character interface, AFAIK.
2. Using UTF-8 with 1..3 bytes/chars gives limits on field
sizes and index lengths only 1/3 of those using a plain
8 bit charset.
3. FB doesn't correctly enforce the char length limit in
MBCS fields. So in a char(8) character set UTF_8 field,
you can store up to 24 ASCII characters. That is really broken.
Don't use it as a feature, as it will be corrected RSN.
4. FB doesn't know about astral plane chars, which should be
of no problem for 99.99% of all users. (It has some sort-of
support by incorrectly encoding them in 6 bytes, which should help
99% of the 0.01% affected).
Regards,
Peter Jacobi
--- "hay77772000" <dhay@l...> wrote:
> Will this then give me full support for dbcs - sorting etc. or do IDo you think of any other dbcs than UNICODE? Please specify.
> need more than this?
But let me think about the possible UNICODE gotchas:
1. Whereas FB internally uses UNICODE wide chars a lot (as
intermediary step in charconv), the database store UTF-8
on disk and direct API programming will most likely use
UTF-8 too. This is somewhat uncommon on Win32, but not
a real problem.
Getting a middle layer involved (JDBC, ODBC or .NET) will give
you the choice of a wide character interface, AFAIK.
2. Using UTF-8 with 1..3 bytes/chars gives limits on field
sizes and index lengths only 1/3 of those using a plain
8 bit charset.
3. FB doesn't correctly enforce the char length limit in
MBCS fields. So in a char(8) character set UTF_8 field,
you can store up to 24 ASCII characters. That is really broken.
Don't use it as a feature, as it will be corrected RSN.
4. FB doesn't know about astral plane chars, which should be
of no problem for 99.99% of all users. (It has some sort-of
support by incorrectly encoding them in 6 bytes, which should help
99% of the 0.01% affected).
Regards,
Peter Jacobi