Subject Re[2]: [Firebird-Architect] Re: Index structures
Author Ann W. Harrison
At 02:36 PM 6/9/2003 -0300, Daniel Rail wrote:

>What about having an Index Allocation Table (IAT) for each index. In
>the IAT, each node is represented by it's ID, how to skip to the next
>node and what is the first entry in the node. So the algorithm
>basically stays the same, except it is doing the preliminary search in
>the IAT, instead of loading all of the pages/nodes. Once it found in
>which node it has to perform a more detailed search, that node is
>fetched from the hard-disk, unless it's already cached.

Err.... We don't load the whole index into cache as a unit and
scan it. That would be horribly inefficient. The index is not
a linear list - it's a tree.

Supposing that the database has just been opened and nothing
is in cache, this is the way an indexed lookup works. The
system reads the index root page for the table. The index root
page contains the page number of the top of the tree that represents
that index. Except for very small indexes the top of the tree
contains nodes that point down in the tree rather than pointing
directly at records.

Suppose you're looking for 'def'. You look in the top level node and
find pointers to '0aaa', '1cd', '0bhi', '0cdg', '1zg', '0dge' etc.
You know that the record you're looking for will be found on the
page that starts '1zg', if it exists at all. The leading 0's
indicate no prefix compression. A leading 1 indicates that one
byte was suppressed; a leading 2 that two bytes were suppressed,

Note that I've used example of keys that are all three letters.
That's a simplification for didactic clarity. In fact, index
keys vary in length, due to suffix compression so a more realistic
example would be '0a', '1cd', '0bit', '0cdgollum', '7zg', '0dge'

With me so far? We've decided that '1zg' is the best fit.
Beside that key is a page number. In a two level index, that
page contains all the key values between 'czg' and 'dge' -
which should include our record - if it exists. Only the
top node and the node containing the record pointer are
scanned. On average, each is only half scanned.

>I know that there is more to this.

Oh my yes.

> Especially if there's as been an
>insert or update, that could affect the entries in the nodes.

Or a deletion - or you're working with a unique index which must
allow duplicates as long as they're not visible to any one transaction.
The index handling is not simple. And then there are bucket splits

>referring to the example above: if the node that is fetched doesn't
>contain the index entry, because is was bumped to another node due to
>an insert or an update, then fetch the IAT again and start the search

How can you tell the difference between something having gone missing
due to changes and something that doesn't exist at all?

>Even after performing the algorithm a second time, you're
>still only fetching a total of 4 nodes/pages compared to approx. 50.

A three level index can identify more than 30 million records
with a 4086 page size. You find the target record (or not) in three
page reads and a scan of (on average) one and a half pages.


We have answers.