Subject Re: [firebird-support] Large DB after batch import: index rebuild/set stats, backup/restore?
Author Ann W. Harrison
On 11/10/2010 12:21 AM, Kjell Rilbe wrote:

>>> How deep are your indexes?
>
> Seems that on the largest table, they all have depth 4. Are there any
> figures in this report that would warrant a config change or anything,
> e.g. different page size?
>

Yes, I'd double the page size. Every new index access reads the
full depth of the index tree, so a four level index is going to be
slower than a three level index. With luck, a range retrieval
will read across the bottom of the index, but if it conflicts with
an index update, it starts again from the top.

Looking at a couple of indexes
>
> Analyzing database pages ...
> Uppgift (567)
>
> Index IX_PK_Uppgift (0)
> Depth: 4, leaf buckets: 588728, nodes: 131418408
> Average data length: 5.08, total dup: 0, max dup: 0
> Fill distribution:
> 0 - 19% = 2968
> 20 - 39% = 15
> 40 - 59% = 279494
> 60 - 79% = 185021
> 80 - 99% = 121230

Hmm. That one could certainly be more dense. Is it possible to
sort your input data by primary key before you store it? In case
it's not clear to everybody a "leaf bucket" is a bottom level index
page. Index pages are called buckets for reasons lost in the mists
of time. And index trees have the leaves on the bottom. "Old
Frothingslosh, the pale stale ale with the foam on the bottom".
Maybe I need less coffee.

>
> Index IX_Uppgift_BorttagsuppA5K (8)
> Depth: 4, leaf buckets: 253733, nodes: 136466214
> Average data length: 0.00, total dup: 136444473, max dup: 119722141
> Fill distribution:
> 0 - 19% = 127
> 20 - 39% = 7685
> 40 - 59% = 131314
> 60 - 79% = 28000
> 80 - 99% = 86607

And this one ... hmm again. Once again, medium fill level,
but here the average data length is zero to two decimal places.
Which is good, but suggests lots of duplicates.

nodes: 136,466,214 <- that's the number of entries in the index
total dup: 136,444,473 <- that's the number that are identical
max dup: 119,722,141 <- that's the longest chain of a single value.

Amazing that it works at all. Do you know which value has 119 million
instances? It was less than 10 years ago when gstat changed its
accumulators from 16 to 32 bits.

>
> Index IX_Uppgift_TilläggsuppYNC (9)
> Depth: 4, leaf buckets: 196830, nodes: 131804550
> Average data length: 0.02, total dup: 131766961, max dup: 11251
> Fill distribution:
> 0 - 19% = 9
> 20 - 39% = 1
> 40 - 59% = 3217
> 60 - 79% = 44
> 80 - 99% = 193559

This index looks pretty good. Yes there are a lot of duplicates, but
they're distributed pretty evenly.

Good luck,

Ann