Subject | Re: [firebird-support] UTF8 in firebird ? |
---|---|
Author | Ann Harrison |
Post date | 2012-01-06T23:14:14Z |
Hi Stephane,
RLE doesn't work well with large fields that are mostly unused -
better than the absence of all compression, but not great when more
than 75% of every field is unused. Most applications used seven-bit
ASCII when Firebird's compression was developed.
Unfortunately, there are more than 256 "characters" in use in
Europe. The part of UTF8 that fits in the first byte is smaller than
256 characters, so even if someone were willing to create such a
bastard character set it wouldn't meet your requirements. You've got
some choices. You can pick one of the almost OK character sets. You
can use UTF8 and not overspecify field lengths and choose field
lengths that are likely to compress well with Firebird's RLE when
they're empty. Or, you can use the fairly well defined interfaces for
character sets and collations and define your own - or hire someone to
do it for you.
Good luck,
Ann
RLE doesn't work well with large fields that are mostly unused -
better than the absence of all compression, but not great when more
than 75% of every field is unused. Most applications used seven-bit
ASCII when Firebird's compression was developed.
Unfortunately, there are more than 256 "characters" in use in
Europe. The part of UTF8 that fits in the first byte is smaller than
256 characters, so even if someone were willing to create such a
bastard character set it wouldn't meet your requirements. You've got
some choices. You can pick one of the almost OK character sets. You
can use UTF8 and not overspecify field lengths and choose field
lengths that are likely to compress well with Firebird's RLE when
they're empty. Or, you can use the fairly well defined interfaces for
character sets and collations and define your own - or hire someone to
do it for you.
Good luck,
Ann