Subject | Re: [firebird-support] Large DB after batch import: index rebuild/set stats, backup/restore? |
---|---|
Author | Ann W. Harrison |
Post date | 2010-11-10T16:36:58Z |
On 11/10/2010 12:21 AM, Kjell Rilbe wrote:
full depth of the index tree, so a four level index is going to be
slower than a three level index. With luck, a range retrieval
will read across the bottom of the index, but if it conflicts with
an index update, it starts again from the top.
Looking at a couple of indexes
sort your input data by primary key before you store it? In case
it's not clear to everybody a "leaf bucket" is a bottom level index
page. Index pages are called buckets for reasons lost in the mists
of time. And index trees have the leaves on the bottom. "Old
Frothingslosh, the pale stale ale with the foam on the bottom".
Maybe I need less coffee.
but here the average data length is zero to two decimal places.
Which is good, but suggests lots of duplicates.
nodes: 136,466,214 <- that's the number of entries in the index
total dup: 136,444,473 <- that's the number that are identical
max dup: 119,722,141 <- that's the longest chain of a single value.
Amazing that it works at all. Do you know which value has 119 million
instances? It was less than 10 years ago when gstat changed its
accumulators from 16 to 32 bits.
they're distributed pretty evenly.
Good luck,
Ann
>>> How deep are your indexes?Yes, I'd double the page size. Every new index access reads the
>
> Seems that on the largest table, they all have depth 4. Are there any
> figures in this report that would warrant a config change or anything,
> e.g. different page size?
>
full depth of the index tree, so a four level index is going to be
slower than a three level index. With luck, a range retrieval
will read across the bottom of the index, but if it conflicts with
an index update, it starts again from the top.
Looking at a couple of indexes
>Hmm. That one could certainly be more dense. Is it possible to
> Analyzing database pages ...
> Uppgift (567)
>
> Index IX_PK_Uppgift (0)
> Depth: 4, leaf buckets: 588728, nodes: 131418408
> Average data length: 5.08, total dup: 0, max dup: 0
> Fill distribution:
> 0 - 19% = 2968
> 20 - 39% = 15
> 40 - 59% = 279494
> 60 - 79% = 185021
> 80 - 99% = 121230
sort your input data by primary key before you store it? In case
it's not clear to everybody a "leaf bucket" is a bottom level index
page. Index pages are called buckets for reasons lost in the mists
of time. And index trees have the leaves on the bottom. "Old
Frothingslosh, the pale stale ale with the foam on the bottom".
Maybe I need less coffee.
>And this one ... hmm again. Once again, medium fill level,
> Index IX_Uppgift_BorttagsuppA5K (8)
> Depth: 4, leaf buckets: 253733, nodes: 136466214
> Average data length: 0.00, total dup: 136444473, max dup: 119722141
> Fill distribution:
> 0 - 19% = 127
> 20 - 39% = 7685
> 40 - 59% = 131314
> 60 - 79% = 28000
> 80 - 99% = 86607
but here the average data length is zero to two decimal places.
Which is good, but suggests lots of duplicates.
nodes: 136,466,214 <- that's the number of entries in the index
total dup: 136,444,473 <- that's the number that are identical
max dup: 119,722,141 <- that's the longest chain of a single value.
Amazing that it works at all. Do you know which value has 119 million
instances? It was less than 10 years ago when gstat changed its
accumulators from 16 to 32 bits.
>This index looks pretty good. Yes there are a lot of duplicates, but
> Index IX_Uppgift_TilläggsuppYNC (9)
> Depth: 4, leaf buckets: 196830, nodes: 131804550
> Average data length: 0.02, total dup: 131766961, max dup: 11251
> Fill distribution:
> 0 - 19% = 9
> 20 - 39% = 1
> 40 - 59% = 3217
> 60 - 79% = 44
> 80 - 99% = 193559
they're distributed pretty evenly.
Good luck,
Ann