Subject Fan out and processor cache friendliness
Author Jan Mikkelsen
Jim Starkey <jas@...> wrote:
>For the record, an Interbase index node consists of the following:
>
> 1 byte of prefix length
> 1 byte of key length (excluding prefix)
> 4 bytes of record number
> variable length key
>
>Each node contains only that part of the key that differs from
>the preceding node. This means that duplicate entries are
>6 bytes long (prefix length == full key length).


In an entirely different context, I've been thinking recently about internal
structure b-tree interior nodes to avoid the sequential search of the page.
Given an 8KB page size, that is probably an average sequential search of
around 3KB per page.

The reason for this thinking is the relative changes in machine performance
characteristics. Main memories are increasing in size, and the speed of
processors is rising rapidly vs. the speed of main memories. This means
that more of (and potentially all of) an index can be main memory resident.
Index page sizes are often a significant percentage of L1 cache size, and
rising.

I've haven't spent a lot of time thinking about it, but my initial thoughts
were to use a form of ternary tree on interior node pages. The pointers
involved tend to increase the average size of an entry, which reduces fan
out. My thoughts are that given a (mostly) memory resident index this
doesn't matter as much because most operations would be limited by main
memory bandwidth anyway (prefetching should remove latency as a problem).
Changing the structure of the page reduces the number of memory locations
touched should improve speed and not have such a large negative effect on
the contents of the L1 cache.

On the encoding front (separate to changing the page structure), the average
size of a key on an interior page is much less than 255 characters and is
probably closer to 2-3 bytes. This can be represented in much less than 8
bits; perhaps a variable length encoding of the key length and the page
number is possible. This would need to be a bit oriented encoding. I don't
think the processor cost of pulling bits out is too significant, especially
given that the processor will probably be waiting for the next cache line
anyway. Purely a way of increasing fanout, although something similar would
be necessary to make the fanout reasonable with a changed page structure.

I don't know whether there would be a win in doing something like this: I
haven't done any specific measurements on the problem. It is just the
observation that databases in general become CPU bound, and larger L1/L2
cache sizes are very beneficial to databases.

What variations from the classic b-tree structure have previously been
considered?

Jan Mikkelsen
janm@...