Subject | Re: Unicode size |
---|---|
Author | Adam |
Post date | 2004-12-14T04:23Z |
I think Helen answered your question pretty well, I just had a thought
for you though. I agree with Helen that large fields are generally not
good candidates for indices.
But if you only want the email address indexed so a record can be
found efficiently based on their email address. If this is the case,
you could store say an MD5 or SHA hash of the email address field in a
separate indexed field. You may want to find an even simpler system,
even CRC might be good enough. It doesn't have to be 100% unique, but
the hash should be only a couple of characters and only leave a couple
of possibilities per hash.
So your table could look like:
contact (id, name, email, emailhash)
Your select statement might be
select ID, Name
from contact
where emailhash = '32FA44D'
and emailaddress = 'firebird-support@yahoogroups.com'
You could use a trigger to fill the emailhash on the before
insert/update events. A UDF function could be written to hash the
strings. SHA-1 and md5 are probably overkill for your purpose. CRC may
be more like what you are after.
Adam
--- In firebird-support@yahoogroups.com, "Chad Z. Hower aka Kudzu"
<chad-jm@h...> wrote:
for you though. I agree with Helen that large fields are generally not
good candidates for indices.
But if you only want the email address indexed so a record can be
found efficiently based on their email address. If this is the case,
you could store say an MD5 or SHA hash of the email address field in a
separate indexed field. You may want to find an even simpler system,
even CRC might be good enough. It doesn't have to be 100% unique, but
the hash should be only a couple of characters and only leave a couple
of possibilities per hash.
So your table could look like:
contact (id, name, email, emailhash)
Your select statement might be
select ID, Name
from contact
where emailhash = '32FA44D'
and emailaddress = 'firebird-support@yahoogroups.com'
You could use a trigger to fill the emailhash on the before
insert/update events. A UDF function could be written to hash the
strings. SHA-1 and md5 are probably overkill for your purpose. CRC may
be more like what you are after.
Adam
--- In firebird-support@yahoogroups.com, "Chad Z. Hower aka Kudzu"
<chad-jm@h...> wrote:
> I have a DB that is unicode except a few fields like email,password, etc
> that I have deemed can/should be ascii.it, but
>
> I had an email field varchar 100 that was unicode. I tried to index
> it said index was too big. So I reduced it to 80. Same. So then Itried ot
> convert it to ASCII, 100. It tehn said "Field must be at least 300chars".
>that? I
> Does FB use 3 chars for each Unicode char? What kind of encoding is
> has assumed it was using UTF 16 which would be 2 bytes, but 3?