Subject | RE: [firebird-support] Unicode size |
---|---|
Author | Chad Z. Hower |
Post date | 2004-12-15T19:48:01Z |
:: size of an index, taking in all segments, can't exceed 252 bytes.
This is why I though 100 would work at first - UTF 16 is 2 bytes.
:: Think hard, too, about how beneficial any 252-byte column
:: might be as a candidate for an index...
I know they are big indexes - but if I have to look up...
If I put a unique constraint on it - will it take up less storage space etc?
:: Alter table yrtable add tempcol varchar(100) character set
Yes, I wasn't surprised that I could not shrink a field, I Was just
surprised by the 300 instead of a 200.
:: >Does FB use 3 chars for each Unicode char?
::
:: Yes, always.
Ouch. :(
:: >What kind of encoding is that? I
:: >has assumed it was using UTF 16 which would be 2 bytes, but 3?
::
:: Nope. It's UNICODE_FSS, which mangles the characters into some
:: uncompressed manifestation of UTF-8, see
:: http://en.wikipedia.org/wiki/UTF-8
Uncompressed UTF 8 - is 8 bytes. It sounds not like UTF8, but UTF24.
:: It's enough to
:: put anyone off using UNICODE_FSS if there is some
:: alternative. Adriano dos
Aside from using 3 bytes are there other problems? I need Unicode and
otherwise have been pretty happy with it.
This is why I though 100 would work at first - UTF 16 is 2 bytes.
:: Think hard, too, about how beneficial any 252-byte column
:: might be as a candidate for an index...
I know they are big indexes - but if I have to look up...
If I put a unique constraint on it - will it take up less storage space etc?
:: Alter table yrtable add tempcol varchar(100) character set
Yes, I wasn't surprised that I could not shrink a field, I Was just
surprised by the 300 instead of a 200.
:: >Does FB use 3 chars for each Unicode char?
::
:: Yes, always.
Ouch. :(
:: >What kind of encoding is that? I
:: >has assumed it was using UTF 16 which would be 2 bytes, but 3?
::
:: Nope. It's UNICODE_FSS, which mangles the characters into some
:: uncompressed manifestation of UTF-8, see
:: http://en.wikipedia.org/wiki/UTF-8
Uncompressed UTF 8 - is 8 bytes. It sounds not like UTF8, but UTF24.
:: It's enough to
:: put anyone off using UNICODE_FSS if there is some
:: alternative. Adriano dos
Aside from using 3 bytes are there other problems? I need Unicode and
otherwise have been pretty happy with it.