Subject Re: Key size too big?!
Author peter_jacobi.rm
Hi Jakub, All,

--- Jakub Hegenbart <Kyosuke@s...> wrote:
> Is it possible to overcome the limit with creating an extra column with
> a truncated version of the field that gets inserted/updated using a
> trigger and can be indexed due to its length?

There is another way to get indices on character ways of
*any* length, which I (alone?) consider more elegant.
Unfortunately, there is a little problem - see a below.

Your trunctated field will give a index usable for sorting,
STARTING WITH where clauses and the like. But it won't be usable
for UNIQUE constraints. Whatever the use of a UNIQUE constraint
on VARCHAR(1000) is, but the users keep asking.

So just append a hash of the complete field after the
trunctated field. Use a known strong hash function and
a hash size estimated from your maximal row count.

E.g. for maximal 4 million rows a 64 bit hash will make
one of 1 million database have a collison, as:
2**64 = 4E6 ** 2 * 1E6

Now the IMHO neat trick is, to put this transformation
into a collation (in fbintl2.dll) and the engine will
internally use this ersatz field for all comparisons
and indexing without the need for a extra column and
a trigger.

I've just done a demo version of such collation, and would
have uploaded it right now, if not some smart guy had decided
not to trust the external collation code. After failed tests
of my collation I found in jrd/intl.cpp

> /* Validity checks on the computed key_length */
>
> if (key_length > MAX_KEY)
> key_length = MAX_KEY;
>
> if (key_length < iLength)
> key_length = iLength;

This effectively says, that the engine doesn't believe
it, when the collation DLL reports the key size to be
shorter than the string size.

So, at the moment, this approach is only open for users
compiling their own engine.

Best Regards,
Peter Jacobi