Subject Re: [IB-Architect] Working of indices
Author Ann W. Harrison
At 04:26 PM 9/5/2002 +0200, Arno Brinkman wrote:

>Has every a collation (in same character-set) it's own index ?

No, or not exactly. An index is a fairly simple tree-like
structure made of database pages that are linked down and
across. Each page is made up of elements called "nodes"
which are essentially key/pointer pairs. At the bottom
level, the pointer is a record identifier (rdb$db_key). At
upper levels the pointer indicates a page in the next level
down.

The collating sequence applies only to the fields that make
up the key. A compound key can contain elements with different
collating sequences. So, no, there isn't a different index for
each collating sequence, there's just a different algorithm for
converting the field value into a key.

> I saw that characterset
>ISO8859_1 and collate ISO8859_1 stores data the same
>way as the data is given (AAAA => x41 x41 x41 x41), but
>characterset ISO8859_1 and collate DU_NL (or DE_DE) stored data
>in a very different way and much longer.

Right. Searching this archive or the ib-support archive for
"red bird" will find David Schnepper's explanation of multi-
level collations. Basically if you want the sort to come out
AA aa AB ab AC ac ... rather than AA AB AC aa ab ac, you've
got to overload the characters. Adding accents overloads them
again. If I remember correctly, for western European languages,
the value is converted to upper case, then trailing bytes are
added to indicate which letters are lower case or accented.

Key creation is one of the more interesting parts of the index
code. The key that's stored in an index node has been rearranged
so that the code that walks the index can just compare values
byte by byte without considering the datatype. All numbers,
except 8 byte integers are converted to double precision. The
floating point number is taken apart and reconstructed so it
compares byte wise. Dates are also stored as double precision
numbers and manipulated the same way.

Trailing zeros and spaces are eliminated. For a compound key,
each part is expanded to a multiple of 5 bytes, after the zeros
are eliminated. A segment count is added every fifth byte -
zero for the first segment, 1 for the second and so on. Suppose
the key value was a name:
"HARRISON ", "ANN ", "E ".

Each part would have it's key position stored into every fifth
byte.

"HARR0ISON0 0 0 ", "ANN 1 1 ", "E 2 ".

Then each part will be truncated at a multiple of 5 and the
results concatenated:

"HARR0ISON0ANN 1E 2".

Without all that folderol, truncating trailing spaces would
result in confusion...

"HARRISON ", "ANNE ", " "
and
"HARRISON ", "ANN ", "E "

would be identical. The segment count eliminates that problem -

"HARR0ISON0ANN 1E 2". = Harrison, Ann, E
"HARR0ISON0ANNE1 2". = Harrison, Anne

If you do an index lookup where last = "HARRISON" and
first = "ANN" and middle = "E", the same transformation
is done, so you actually search for HARR0ISON0ANN 1E 2


When the key is stored, bytes at the front that match the previous
key are eliminated and replaced with a byte indicating how many
bytes were identical. To be more precise, here's the structure of
a B-tree node

typedef struct btn {
UCHAR btn_prefix; /* size of compressed prefix */
UCHAR btn_length; /* length of data in node */
UCHAR btn_number [4]; /* page or record number */
UCHAR btn_data [1];

} *BTN;

Take the two keys created above. The first would be stored as

0, 20, <pointer>, "HARR0ISON0ANN 1E 2"

the second would be stored as

13, 7, <pointer>, "E1 2"

if there were a duplicate HARRISON, ANNE, it would
be stored like this

20, 0, <pointer>, ""


Since we've got into code, here's the definition of an index
page, also called, for obscure reasons, a bucket.

typedef struct btr {
struct pag btr_header;
SLONG btr_sibling; /* right sibling page */
SLONG btr_left_sibling; /* left sibling page */
SLONG btr_prefix_total; /* sum of all prefixes on page */
USHORT btr_relation; /* relation id for consistency */
USHORT btr_length; /* length of data in bucket */
UCHAR btr_id; /* index id for consistency */
UCHAR btr_level; /* index level (0 = leaf) */
struct btn btr_nodes [1];
} *BTR;

Every database page has a header which includes the page type,
flags, checksum, and generation. That's what "struct pag btr_header"
means.

The "sibling" is the next page at the same level. Your right
sibling holds the next value greater than the highest value
you've got. Similarly, the left sibling holds the next lower
value. The prefix and data length together produce the total
data length of keys in the bucket. That's important when
splitting buckets.

As the comments indicate, the index page also includes the relation
id and index id, which are useful for diagnosing scrambled indexes.

The level indicates the height in the tree. Level 0 points to
records. Level 1 points to level 0, etc.

>How is an index lookup is done ?

First, the input values are converted to key format, then that
value is compared against the top level node. The top level
node will contain pointers to lower levels. Imagine a three
level index on names. The top bucket will contain key value
pairs like this:

0/34, CARTMAN/35, ELKINS/36, MIDDLEMARCH/37, YANCHEZ/38

You're looking for HARRISON. HARRISON is higher than ELKINS
and lower than MIDDLEMARCH, so you go down a level and look
at page 36. Page 36 will contain keys like this:

ELKINS/55, FETTERMAN/56, GUSTAV/57, HARRIS/58, INGLEMAN/59 ...

so you go down to the next level (which happens to be 0) and
find

HARRIS/1:67:1, HARRISON/1:67:2, HARRISTON/1:67:3, HUTCHINSON/1:67:4...

Note that the zero level pointers are different; they've got
three parts. The first part is the sequence number of the pointer
page for the names table. The second part is the offset on that
page of the number of the data page. The third is the index on
that page that points to the actual record. So, in theory, 1:67:2
points to the one and only HARRISON.

That entry may indicate a records that's been deleted and not
garbage collected, updated changing the key and not garbage
collected, or stored by a concurrent transaction. So, once
a match has been found, Firebird has to read the record to be
sure it is appropriate for the requesting transaction.


This probably creates more confusion than it resolves, so please
ask questions as they occur.



Regards,

Ann
www.ibphoenix.com
We have answers.