firebird-architect - Re: [IB-Architect] Fan out and processor cache friendliness

Subject	Re: [IB-Architect] Fan out and processor cache friendliness
Author	Jim Starkey
Post date	2000-04-07T22:48:27Z

At 08:29 AM 4/8/00 +1000, Jan Mikkelsen wrote:

>
>On the encoding front (separate to changing the page structure), the average
>size of a key on an interior page is much less than 255 characters and is
>probably closer to 2-3 bytes. This can be represented in much less than 8
>bits; perhaps a variable length encoding of the key length and the page
>number is possible. This would need to be a bit oriented encoding. I don't
>think the processor cost of pulling bits out is too significant, especially
>given that the processor will probably be waiting for the next cache line
>anyway. Purely a way of increasing fanout, although something similar would
>be necessary to make the fanout reasonable with a changed page structure.
>
>I don't know whether there would be a win in doing something like this: I
>haven't done any specific measurements on the problem. It is just the
>observation that databases in general become CPU bound, and larger L1/L2
>cache sizes are very beneficial to databases.
>
>What variations from the classic b-tree structure have previously been
>considered?
>

First, as you may already know, the keys are pre-mulched so that
keys can be compared as unsigned byte streams without regard to
type, length, or single/multiple field (there's an interesting
trick if anybody's interested). The idea was to remove all
complexity from the page scan code to make the inner loop as
fast as possible.

Second, the prefix compression scheme allows most nodes to be
skipped based on prefix length. For example, if the last
key comparison was unequal in the second byte, all node with
a prefix of two or greater can be skipped without further
ado. This translates into an average number of memory references
per node somewhere between two and three (one for the prefix,
one possibly for a key byte, and one for the key length to
find the next node). I'm quite sure it can't be made faster.

Something that I've been sorely tempted to do is to use a
variable length prefix/length fields to support keys greater
than 255 bytes. It would be computationally cheaper to make
the prefix/length fields two bytes each, but the wasted bits
bug me. A variable length encoding could store lengths up
to 127 in a single byte (which is really just about everything),
requiring a second byte for only very long keys at the expense
of twice the number of machine instructions per node.

Jim Starkey