Subject Re: Index structures
Author Alexander Klenin
>>1) create "partially compressed" indexes along the lines
>>of MPEG algorithm, where every Nth index value is stored
>>completely, and values in between are compressed.
>
>That doesn't help at all. Values are NOT decompressed
>for comparison. The length of the compressed prefix is part
>of the search algorithm.
[explanation skipped]
Thanks for taking time to explain, but it seems that my
initail understanding of how compression works was correct
;)

>>The search should find the nearest uncompressed value, then
>>scan next N values in linear time. By adjusting N one can
>>balance search performance vs memory increase.
>
>What Arno and Nicholay were proposing was the ability to
>do a binary search of an index bucket which requires (to
>my mind) that all nodes be the same length. This is not
>a MPEG type problem where decompression is done in real
>time and is costly. Comparing compressed keys is cheaper
>than comparing non-compressed keys. A binary search is
>generally faster than a scan.
I did not suggest that compression costs are anything like
MPEG, just used it as an illustration. Ok, let me explain
my thinking more carefully:

PROBLEM 1:
Linear search for a key in the index page is slow when
there are many repeated lookups in the small index (often
the case with current optimizer ;). At least, that is what
Arno said (and I assume that this is true). The binary
search would help this, but it is precluded by the key
compression.
SOLUTION 1:
As proposed by Arno: disable key compression altogether
for small keys. (For some definition of "small"). This
(apparently) solves problem 1, but leads to

PROBLEM 2:
Uncompressed indexes take more space, thereby slowing
lookup when indexes are large and do not fit in the cache.
This slowdown is the reason why Jim Starkey opposes
solution 1, and so I thought that it may be amended by
SOLUTION 2:
Force insertion of uncompressed index (in the form "0abcd"
as per your example) after every N records even if actual
key values allow compression.
The search algorithm then can work as following:
1) do a binary search on uncompressed values until the key
is found or interval length is less then N
2) scan search interval with currect (compression-aware)
method
Given a reasonable (or, perhaps, tunable) value of N it is
possible to avoid index inflation while still keeping page
search time essentially logarithmic.
However, there is a
CAVEAT:
Binary search should be able to quickly locate the
uncompressed key nearest to a given offset on the page.
This can be achieved by some tricks with alignment and/or
a few extra bits per key. Alternatively, a "real" binary
tree-like structure can be added to every index page. If
anyone cares, I can explain those, but first please
consider this idea in general.

Did I really miss something?

Ask
---
Professional hosting for everyone - http://www.host.ru