Subject Re: [firebird-support] Index Stats
Author Aage Johansen
On Tue, 12 Aug 2003 21:30:00 +0000 (UTC), Don Gollahon wrote:

> I need a little more info than the IB docs are giving me on the meaning
> of the Index Statistics.

I've taken the liberty of copying an old post by Ann on this subject - you
can find even more (and with more context) in the archives of the old MERS
Note that it is dated 1998, and an answer to a specific problem -
nevertheless, one might learn from it.

8 May 1998 15:50:36 Ann Harrison <harrison@...>

Re: Understanding Index Stats (really long)

At 08:04 PM 5/6/98 -0400, Bill Meaney wrote:
>Could someone direct me to where I might get more information on how
>indexing works in IB.

Well, here, I guess.

>I've looked at the whole 1/2 page of online help about database stats but
>I'm am still having trouble understanding what some of the attributes

The description in the Operations Guide is completely correct, but
perhaps a little terse (and I think the reference to sorting is
misleading - but hey, it's correct as far as it goes.)

>Has Ann written the chapter on Understanding IB Indexes for her book yet.

No, but you're about to see a draft, free and focused on your very own

>Or are there any books in the bookstores that
>may help me to understand the way IB approaches indexing?

You might try Kunth's Sorting and Searching - that was part of the source
for the InterBase indexing scheme. Another part came from the Data Computer
project at CCA.

>I was hoping to be able to look at the stats for a table like the ones
>below and tell whether it was time to rebuild the indexes.

The stats would tell you, but unfortunately, your particular stats don't,
because rebuilding isn't going to solve the problem they do expose. Let
me go directly to the stats, then come back to your performance problems.
So as not to leave you panicked, I'll tell you in advance that I think
you can make the database considerably faster.

>CPTVAL (39)

The table name is CPTVAL, its relation_id is 39. You probably don't
care at all about the relation_id, but the internal code generator does
and this report is created from the same sources used for code generation.

> Primary pointer page: 293, Index root page: 294

These are table attributes. Rows in a table are located through a three
level index. The first part indicates a page of pointers to pages, the
second the offset on that page of the data page, and the third the index
slot on the data page that points to the actual row. When a table is
created, it immediately gets the initial pointer page. If you were trying
to put a broken database back together - a process I used to call "making
hamburger into cows" knowing the initial pointer page location can help.
From your point of view, it doesn't help at all.

The index root page is also created when the table is defined. That's why
it happens to be adjacent to the pointer page. You'll find that the primary
pointer page and the index root page are almost always adjacent. The index
root page points to the apex of each index. Again, its location doesn't
help you at all, but the internals use it a lot.

> Data pages: 1385, data page slots: 1385, average fill: 69%

The table fills 1385 data pages. The pointer pages have used a total of
1385 page slots, indicating that the table is about as large as it has
ever been. Page slots are the pointers on pointer pages. If a page
empties and is released, the page slot it occupied is set to null until
another page is allocated for the table.
The average fill is the amount of space used on each data page. By default,
space is left on every page to hold one record header plus a bit for each
row on the page. That is done to increase the chances that a back version
can be stored on the same page with the primary record version. Keeping
them together saves LOTS of swapping back and forth when back versions are
created and garbage collected. Your average fill looks fine, perhaps a
little low for a stable table - indicating that your row size is small
compared with the page size. Nothing to worry about though.

> Fill distribution:
> 60 - 79% = 1385

This is a degenerate histogram - all the values for fill distribution fall
in a single bar of the histogram - meaning that all pages are evenly filled,
probably to about 69%. That's a good sign, and agrees with your description
of the table as quite stable.

> Index RDB$FOREIGN14 (3)

Ooooh, I just love numbers sorted alphabetically. Anyway, this is the first
index on the CPTVAL table in alphabetical order by index name. It's a
system generated index (starts with "RDB$") supporting a foreign key
(next token is "FOREIGN") and was the 14th such index created in the DDL.
Oddly, its internal identifier is three, indicating that it was the fourth
(InterBase is a C shop, so everything starts at 0) index actually created
on the table. That's inobvious, but makes sense to me. When the DDL is
processed, the actual work is deferred until commit time. The list of
deferred work is built from the front (LIFO) then sorted for dependencies.
Indexes are created LIFO.

> Depth: 2, leaf buckets: 109, nodes: 73373

An index is a tree - upside down if you must be terribly literal - with
the root at the top and the leaves at the bottom. This index has only
two levels. That's good. The bottom level has only 109 pages in it.
That begins to seem suspicious. Those 109 pages point to 73,373
individual rows, which seems like a lot. Let's look further.

> Average data length: 0.00, total dup: 73372, max dup: 32351

The average data length is the average length of the key as stored. This
has very little to do with the length as defined. Index keys actually
hold only three types of data - four if you want to be very fussy and
look at an obscure case. The third,the obscure one, is a byte array.
That's a string in which bytes with a value of zero and 32 (aka space)
are significant. Nobody uses those. The second most obscure is a null -
I've forgotten how they're represented, but its the same for all datatypes.
The third is a string. Varchar and char are both represented the same
way; both have trailing spaces suppressed. If you're using a multi-byte
collating sequence (French, for sure, but also the other European
languages) the key is translated into its multi-byte format. The result
is a string of bytes that produces the desired order (collating sequence)
when sorted bytewise (fast).
The fourth type is a double precision floating point number. All numerics
and all dates are stored in indexes as double precision numbers, with
trailing zeros suppressed. The number is also mushed around so it
sorts bytewise (sign byte at the front, exponent, then mantissa).
OK, so all that suggests that the key will be expanded if its a string
and uses a multi-byte collating sequence, or if its a float, integer,
or small int. Then it shrinks when trailing spaces and zeros are removed
from the end of each segment. It expands again as segment boundaries are
introduced - that's slightly obscure, but it means that segmented keys
can't be quite as long (taken alltogether) as single column keys.
Now here comes the good part. The keys are sorted byte wise and prefix
compressed. If the first key is AAA123A, and the next is AAA124B, then
the second is stored as (5)4B, meaning "first five bytes are the same
as the previous, then append "4B".
How does all this lead to an average key length of zero. Well, prefix
compression reduces the key length of a duplicate zero. And look there,
its says: total dup: 73372, max dup: 32351. So of 73374 rows, all but
one is a duplicate. Probably all but two, but some of us are better at
boundary conditions than others. The largest number of duplicates on
a single chain is 32351. The numbers don't quite add up, but I suspect
that you've got a two-valued field here that you've indexed to support
a foriegn key. That's not the end of the world in a stable table.
Part of the discrepency (if the longest chain is 32351 and there are
only two chains, then there are only 64702 rows accounted for), may
be in the way the first (and last? don't remember) nodes in an index
page are represented. To reduce the cost of index walking, they're
not compressed. Could the others be nulls? Don't know.
Here's a statistic to remember - you've got 109 leaf pages holding
pointers to 73,000 rows or approximately 673 nodes per page. Each chain
is about 32,000 nodes long, so each duplicate chain is about 48 pages

> Fill distribution:
> 80 - 99% = 109

The fill distribution for an index is the amount of space in each page
of the index that's used for data and pointers. 80-99% is good. Lower
fill distributions are the strongest clue that you should rebuild the
And on to the next index. This one will be quicker, I promise.

> Index RDB$FOREIGN146 (1)

Also an automatically generated index to support a foreign key constraint.

> Depth: 2, leaf buckets: 109, nodes: 73373

Depth, buckets, and nodes, all the same. Did I mention that 'bucket' is
old-geek-speak for a page in an index? Did I mention that old geeks get

> Average data length: 0.00, total dup: 73303, max dup: 7905

This one is bad, but not nearly so terrible. The problem with the
previous index was not so much the number of duplicates, but the length
of each chain. The chains in this index are less than a quarter of the
length of the previous chains - on the order of 10 or twelve pages long.

> Fill distribution:
> 80 - 99% = 109

Fill is fine.

> Index RDB$FOREIGN147 (2)
> Depth: 2, leaf buckets: 112, nodes: 73373
> Average data length: 0.00, total dup: 65220, max dup: 23
> Fill distribution:
> 80 - 99% = 112

This automatically generated index still reports the average data length
as zero, but it must be a bigger zero than the other zeros because the
number of leaf buckets is up by three. Each duplicate chain fits with
a single bucket...

> Index RDB$PRIMARY10 (0)
> Depth: 3, leaf buckets: 298, nodes: 73373
>* Average data length: 10.00, total dup: 0, max dup: 0
> Fill distribution:
> 0 - 19% = 1
> 80 - 99% = 297

Also a system generated index, this time to support the primary key
constraint. The key length of 10 indicates that some compression is
done - which is normal and good. The fact that one bucket is under
filled doesn't mean anything - except that the number of nodes didn't
fit evenly into the buckets. That's completely normal.

Parenthetical Lecture on the Evils of Duplicates.
Here, I'm going to give a brief summary of the problem with long duplicate
chains for those who were asleep or off-list during previous descriptions.
There's no problem with storing duplicates. New entries are stored at the
front of the chain.
In version 5, there's a good chance that an index with lousy distribution
(i.e. long duplicate chains - a two-valued index) won't ruin the
performance of a query. In previous versions, InterBase was completely
capable of looking at two access strategies - find a row based on primary
key and find the same row based on an index with thousands of duplicates.
It would use the primary key in computing a join order but add on the
terrible foreign key in the final pass - so it spent lots of time building
a huge bitmap of qualified rows from the lousy index then threw out all
then entries but one when it combined that bit map with the one it created
from the unique index. [Not bright? Maybe not, but we really couldn't
imaging anyone defining an index that was not selective enough to be used.
Why bother?]
However, an update of the key value, or the deletion of a row is really
expensive when you've got lots of duplicates. One of the worst things
you can do to an InterBase database is define a table with a million rows,
all with the same key value for a secondary index, then delete all those
rows. The last duplicate stored appears first in the list, and the first
stored is last. The deletion ordinarily takes the first row stored first,
then the second, and so on. So the index walking code has to run through
the whole duplicate chain for each deletion, always finding the entry it
wants in the very last position. Churns the cache like nothing you ever
One not terribly obvious factor in all this is that the cost of all that
churning and roiling is NEVER borne by the transaction that deletes all
the rows in a table. Updating a key value or deleting a row affects the
index ONLY when the old back version is garbage collected. That means
that cost goes to the next transaction that touches the row AFTER all
the transactions that were active when the update or delete happend have
exited. Let me say that again: nodes are deleted from an index when the
record version is garbage collected, not when it is created!
End of Parenthetical Lecture on the Evils of Duplicates.

On to the table that holds data temporarily while it's being validated
and is flushed and reloaded periodically.

>WIPQUE (149)
> Primary pointer page: 515, Index root page: 516
> Data pages: 61, data page slots: 63, average fill: 15%
> Fill distribution:
> 0 - 19% = 49
> 20 - 39% = 12

A smaller table, certainly, and more volitile. It's been bigger -
there are empty slots. The fill level suggests that rows have been
removed selectively. That's OK.

> Index RDB$FOREIGN263 (1)
> Depth: 1, leaf buckets: 10, nodes: 481
> Average data length: 0.00, total dup: 480, max dup: 480
> Fill distribution:
> 0 - 19% = 10

Every row has the same value. Not good. Terrible fill level, suggesting
spotty deletions.... The whole thing is very small, which is good, because
if it were large it would be dreadful.

> Index RDB$FOREIGN280 (2)
> Depth: 2, leaf buckets: 9, nodes: 481
> Average data length: 0.00, total dup: 474, max dup: 117
> Fill distribution:
> 0 - 19% = 9

Again, as with the previous table, this index is bad, but only a quarter
as bad as the previous one.

> Index RDB$PRIMARY131 (0)
> Depth: 2, leaf buckets: 12, nodes: 481
> Average data length: 1.00, total dup: 0, max dup: 0
> Fill distribution:
> 0 - 19% = 12

Now, finally, on to suggestions.

>---This table WIPQue is very dynamic. It is a WorkInProcessQue where
>external data (500-1000 records/day) are loaded until it can be validated.
>Once validated the data is moved to production tables and the WIPQue records
>are deleted. After several days of loading the system slows down and the
>customer has to do a backup & restore :( for several hours.

Don't do a backup and restore, instead, drop and recreate the foriegn
key constraints on the volitile table whenever you can get exclusive
access to it. You could watch the index fill level on those indexes
and when it gets down below 40%, then drop and recreate the constraints.
If you can possibly drop the constraint that creates index RDB$FOREIGN263,
do so. If your master tables are stable, then you could use triggers
to validate input data rather than foreign key constraints - triggers
can be less than reliable if the master table is volitile, but they're
fine for a stable master, and they avoid the automatic creation of really
terrible indexes.

Hope this helps,

> Consider the following example:
> Index TOLL_INDEX (3)
> Depth: 3, leaf buckets: 16823, nodes: 7096542
> Average data length: 5.00, total dup: 293392, max dup: 26
> Fill distribution:
> 0 - 19% = 76
> 20 - 39% = 6
> 40 - 59% = 8865
> 60 - 79% = 6861
> 80 - 99% = 1015
> This says there are a lot of total dups but since max dups is low then
> this index is probably okay, Right?

Guess so.

> Now consider the following:
> Depth: 3, leaf buckets: 10348, nodes: 7096545
> Average data length: 0.00, total dup: 7096513, max dup: 809128
> Fill distribution:
> 0 - 19% = 4
> 20 - 39% = 4
> 40 - 59% = 10324
> 60 - 79% = 11
> 80 - 99% = 5
> This index has a large max dup. I should probably add a field to the index
> reduce that? I assume there is a point at which it would be better not to
> have an index at all if it is going to have a large amount of dups, right?
> Is there a rule of thumb to determine this?

Add the primary key field to the index (hope it's just an integer!).
If there is more than a thousand or so dups consider doing something ... (YMMV)

Aage J.