Subject | Re: [Firebird-Architect] Re: Index structures |
---|---|
Author | Ann W. Harrison |
Post date | 2003-06-09T16:54:38Z |
At 06:50 PM 6/8/2003 +0400, Alexander Klenin wrote:
comparison. The length of the compressed prefix is part
of the search algorithm.
For example, if you're looking for 'def' and the index bucket
starts with '0abc' (meaning no prefix compression and a value
of 'abc') then you see 2d (meaning the first two bytes are
identical to the previous node and the third byte is a 'd'),
just by reading the prefix compression length you can determine
that there is no match.
Anything that has a non-zero compression length is not a
candidate, and the node header describes how to skip to
the next node. The next node with no prefix compression
may be '0bcd'. Again, you skip across nodes until you come
to a node with no prefix compression. This time you might
find '0daa'. Then you skip nodes that have a prefix
compression more than one until you find '1ea'. Then
you look at nodes with a prefix compression of 2 until
you find '2f', which is a hit, or '2g', '1..', or '0...',
any of which indicates that there is no match. (the dots
in the last two key representations are meant to indicate
any character - you don't even need to look at the byte
after the number.)
do a binary search of an index bucket which requires (to
my mind) that all nodes be the same length. This is not
a MPEG type problem where decompression is done in real
time and is costly. Comparing compressed keys is cheaper
than comparing non-compressed keys. A binary search is
generally faster than a scan.
We ought to consider the cost of a second type of index
before jumping into this.
Regards,
Ann
>Well, I am NOT familiar with Firebird internals, but outThat doesn't help at all. Values are NOT decompressed for
>of general considerations, two solutions come to mind:
>
>1) create "partially compressed" indexes along the lines
>of MPEG algorithm, where every Nth index value is stored
>completely, and values in between are compressed.
comparison. The length of the compressed prefix is part
of the search algorithm.
For example, if you're looking for 'def' and the index bucket
starts with '0abc' (meaning no prefix compression and a value
of 'abc') then you see 2d (meaning the first two bytes are
identical to the previous node and the third byte is a 'd'),
just by reading the prefix compression length you can determine
that there is no match.
Anything that has a non-zero compression length is not a
candidate, and the node header describes how to skip to
the next node. The next node with no prefix compression
may be '0bcd'. Again, you skip across nodes until you come
to a node with no prefix compression. This time you might
find '0daa'. Then you skip nodes that have a prefix
compression more than one until you find '1ea'. Then
you look at nodes with a prefix compression of 2 until
you find '2f', which is a hit, or '2g', '1..', or '0...',
any of which indicates that there is no match. (the dots
in the last two key representations are meant to indicate
any character - you don't even need to look at the byte
after the number.)
>TheWhat Arno and Nicholay were proposing was the ability to
>search should find the nearest uncompressed value, then
>scan next N values in linear time. By adjusting N one can
>balance search performance vs memory increase.
do a binary search of an index bucket which requires (to
my mind) that all nodes be the same length. This is not
a MPEG type problem where decompression is done in real
time and is costly. Comparing compressed keys is cheaper
than comparing non-compressed keys. A binary search is
generally faster than a scan.
We ought to consider the cost of a second type of index
before jumping into this.
Regards,
Ann