Subject Re: [Firebird-Architect] Re: Index structures
Author Ann W. Harrison
At 06:50 PM 6/8/2003 +0400, Alexander Klenin wrote:

>Well, I am NOT familiar with Firebird internals, but out
>of general considerations, two solutions come to mind:
>1) create "partially compressed" indexes along the lines
>of MPEG algorithm, where every Nth index value is stored
>completely, and values in between are compressed.

That doesn't help at all. Values are NOT decompressed for
comparison. The length of the compressed prefix is part
of the search algorithm.

For example, if you're looking for 'def' and the index bucket
starts with '0abc' (meaning no prefix compression and a value
of 'abc') then you see 2d (meaning the first two bytes are
identical to the previous node and the third byte is a 'd'),
just by reading the prefix compression length you can determine
that there is no match.

Anything that has a non-zero compression length is not a
candidate, and the node header describes how to skip to
the next node. The next node with no prefix compression
may be '0bcd'. Again, you skip across nodes until you come
to a node with no prefix compression. This time you might
find '0daa'. Then you skip nodes that have a prefix
compression more than one until you find '1ea'. Then
you look at nodes with a prefix compression of 2 until
you find '2f', which is a hit, or '2g', '1..', or '0...',
any of which indicates that there is no match. (the dots
in the last two key representations are meant to indicate
any character - you don't even need to look at the byte
after the number.)

>search should find the nearest uncompressed value, then
>scan next N values in linear time. By adjusting N one can
>balance search performance vs memory increase.

What Arno and Nicholay were proposing was the ability to
do a binary search of an index bucket which requires (to
my mind) that all nodes be the same length. This is not
a MPEG type problem where decompression is done in real
time and is costly. Comparing compressed keys is cheaper
than comparing non-compressed keys. A binary search is
generally faster than a scan.

We ought to consider the cost of a second type of index
before jumping into this.