firebird-support - Re: Unicode size

Subject	Re: Unicode size
Author	Adam
Post date	2004-12-14T04:23Z

I think Helen answered your question pretty well, I just had a thought
for you though. I agree with Helen that large fields are generally not
good candidates for indices.

But if you only want the email address indexed so a record can be
found efficiently based on their email address. If this is the case,
you could store say an MD5 or SHA hash of the email address field in a
separate indexed field. You may want to find an even simpler system,
even CRC might be good enough. It doesn't have to be 100% unique, but
the hash should be only a couple of characters and only leave a couple
of possibilities per hash.

So your table could look like:

contact (id, name, email, emailhash)

Your select statement might be

select ID, Name
from contact
where emailhash = '32FA44D'
and emailaddress = 'firebird-support@yahoogroups.com'

You could use a trigger to fill the emailhash on the before
insert/update events. A UDF function could be written to hash the
strings. SHA-1 and md5 are probably overkill for your purpose. CRC may
be more like what you are after.

Adam

--- In firebird-support@yahoogroups.com, "Chad Z. Hower aka Kudzu"
<chad-jm@h...> wrote:

> I have a DB that is unicode except a few fields like email,

password, etc

> that I have deemed can/should be ascii.
>
> I had an email field varchar 100 that was unicode. I tried to index

it, but

> it said index was too big. So I reduced it to 80. Same. So then I

tried ot

> convert it to ASCII, 100. It tehn said "Field must be at least 300

chars".

>
> Does FB use 3 chars for each Unicode char? What kind of encoding is

that? I

> has assumed it was using UTF 16 which would be 2 bytes, but 3?