Subject | Re: [IB-Architect] Fan out and processor cache friendliness |
---|---|
Author | Jan Mikkelsen |
Post date | 2000-04-07T23:01:34Z |
Jim Starkey <jas@...> wrote:
cost is in touching the memory, and some extra processor time isn't the end
of the world, and may not even make any difference. I understand how the
parameters were different when it was first designed and implemented.
If you touch one memory location, the whole cache line is fetched, and room
has to be found in the processor's L1 cache. Given an average size of an
entry of (say) 7 bytes, you end up touching every cache line. I'm trying to
think of a scheme that touches fewer cache lines. The problem with the
solutions I have is that they reduce fan out. For small indexes it is
probably still a win, but for large indexes the increased disk I/O kills any
memory bandwidth win.
because with prefetching, you can do work while you wait for the next cache
line to become L1 cache resident. Variable length encodings are becoming
advantageous.
Jan Mikkelsen
janm@...
>First, as you may already know, the keys are pre-mulched so thatYes. However, given the change in relative processor/memory speeds the real
>keys can be compared as unsigned byte streams without regard to
>type, length, or single/multiple field (there's an interesting
>trick if anybody's interested). The idea was to remove all
>complexity from the page scan code to make the inner loop as
>fast as possible.
cost is in touching the memory, and some extra processor time isn't the end
of the world, and may not even make any difference. I understand how the
parameters were different when it was first designed and implemented.
>Second, the prefix compression scheme allows most nodes to beYes, I understand that entries can be skipped. The problem is cache lines.
>skipped based on prefix length. For example, if the last
>key comparison was unequal in the second byte, all node with
>a prefix of two or greater can be skipped without further
>ado. This translates into an average number of memory references
>per node somewhere between two and three (one for the prefix,
>one possibly for a key byte, and one for the key length to
>find the next node). I'm quite sure it can't be made faster.
If you touch one memory location, the whole cache line is fetched, and room
has to be found in the processor's L1 cache. Given an average size of an
entry of (say) 7 bytes, you end up touching every cache line. I'm trying to
think of a scheme that touches fewer cache lines. The problem with the
solutions I have is that they reduce fan out. For small indexes it is
probably still a win, but for large indexes the increased disk I/O kills any
memory bandwidth win.
>Something that I've been sorely tempted to do is to use aIn this context machine instructions really are free on modern machines,
>variable length prefix/length fields to support keys greater
>than 255 bytes. It would be computationally cheaper to make
>the prefix/length fields two bytes each, but the wasted bits
>bug me. A variable length encoding could store lengths up
>to 127 in a single byte (which is really just about everything),
>requiring a second byte for only very long keys at the expense
>of twice the number of machine instructions per node.
because with prefetching, you can do work while you wait for the next cache
line to become L1 cache resident. Variable length encodings are becoming
advantageous.
Jan Mikkelsen
janm@...