Subject Re: [firebird-support] Allowing Space for Unicode
Author Lee Jenkins
Helen Borrie wrote:
>
>
> At 04:18 AM 30/01/2009, you wrote:
>
> >Hi all,
> >
> >Have heard a rumor that America is not the only country in the world ;)
> >
> >Can anyone offer general rules for setting size attributes of VARCHAR
> fields?
> >For instance, if I believe that Varchar(128) is sufficient for
> english, then
> >Varchar(256) probably should be used to allow for unicode...
>
> No "rules" needed. If varchar(128) is sufficient for English then it
> will be sufficient for the same data in other languages. That's because
> the numbers in char() and varchar() specify the number of characters,
> not the number of bytes.
>
> Charset NONE (alias ASCII7, USASCII) is a single-byte charset (SBCS).
> Varchar(128) allows up to 128 bytes for a SBCS. Fb supports some
> multi-byte charsets (MBCS) that are not unicode, e.g., SJIS_208 is a
> charset for encoding Japanese characters, 2 bytes for each character, so
> a varchar(128) will have 256 bytes capacity. UTF8 is a MBCS that can
> store up to 4 bytes per character, capacity 512 bytes for your
> varchar(128) in that charset.
>
> Where you will have to watch out is with indexes. Their limits are
> determined by byte length, not number of characters. So, for example, in
> a database with an 8KB page size, an index key can be at most 2048
> bytes. 2048/4 = 512 characters. But it doesn't stop there: non-binary
> collations (essential for unicode and for most non-ascii charsets) eat
> more bytes...lots more in most cases. Multi-segment keys eat even more.
> For example, the maximum size of an indexed varchar in UTF8 in a DB with
> an 8Kb page size is 40 characters!
>

Ouch! I will watch out for this.

Thanks Helen and rest for clearing this up.

--
Warm Regards,

Lee