Subject Re: [firebird-support] Large DB after batch import: index rebuild/set stats, backup/restore?
Author Kjell Rilbe
Ann W. Harrison skriver:
>
>
> On 11/10/2010 12:21 AM, Kjell Rilbe wrote:
>
> Yes, I'd double the page size.

I'll try that. Thank you!

> Looking at a couple of indexes
> >
> > Analyzing database pages ...
> > Uppgift (567)
> >
> > Index IX_PK_Uppgift (0)
> > Depth: 4, leaf buckets: 588728, nodes: 131418408
> > Average data length: 5.08, total dup: 0, max dup: 0
> > Fill distribution:
> > 0 - 19% = 2968
> > 20 - 39% = 15
> > 40 - 59% = 279494
> > 60 - 79% = 185021
> > 80 - 99% = 121230
>
> Hmm. That one could certainly be more dense. Is it possible to
> sort your input data by primary key before you store it? In case
> it's not clear to everybody a "leaf bucket" is a bottom level index
> page. Index pages are called buckets for reasons lost in the mists
> of time. And index trees have the leaves on the bottom. "Old
> Frothingslosh, the pale stale ale with the foam on the bottom".
> Maybe I need less coffee.

The OO framework assigns PK values from a global counter, so should be
incremental. On the other hand, about 10000 objects are created in
memory before writing to disk, and I might guess that they're not
inserted into the tables in PK order. That would explain what we see.

This index will probably not be used very regularly. For the most part,
we would find the records here via IX_Uppgift_Hållare. So I will ignore
the "fluff" here for the time being.

> > Index IX_Uppgift_BorttagsuppA5K (8)
> > Depth: 4, leaf buckets: 253733, nodes: 136466214
> > Average data length: 0.00, total dup: 136444473, max dup: 119722141
> > Fill distribution:
> > 0 - 19% = 127
> > 20 - 39% = 7685
> > 40 - 59% = 131314
> > 60 - 79% = 28000
> > 80 - 99% = 86607
>
> And this one ... hmm again. Once again, medium fill level,
> but here the average data length is zero to two decimal places.
> Which is good, but suggests lots of duplicates.
>
> nodes: 136,466,214 <- that's the number of entries in the index
> total dup: 136,444,473 <- that's the number that are identical
> max dup: 119,722,141 <- that's the longest chain of a single value.
>
> Amazing that it works at all. Do you know which value has 119 million
> instances? It was less than 10 years ago when gstat changed its
> accumulators from 16 to 32 bits.

:-) The index is added by default by the OO framework, and I see now
that it probably doesn't do much good. Most records have null in the
indexed field, so that would be the 119 million sequence.

Are we running a risk of hitting a hard limitation here? Max sequence
length? Seems like an odd limitation to have, but I ask just in case. In
FB I mean - I don't care about gstat.

At this time, this index won't be used. We don't find records that way
right now. Maybe future situations will arise, but right now - no.

> > Index IX_Uppgift_TilläggsuppYNC (9)
> > Depth: 4, leaf buckets: 196830, nodes: 131804550
> > Average data length: 0.02, total dup: 131766961, max dup: 11251
> > Fill distribution:
> > 0 - 19% = 9
> > 20 - 39% = 1
> > 40 - 59% = 3217
> > 60 - 79% = 44
> > 80 - 99% = 193559
>
> This index looks pretty good. Yes there are a lot of duplicates, but
> they're distributed pretty evenly.

This is the one that has the most "real" values. It should never be null
and the max dup sequence of 11251 seems reasonable considering how data
has been added (250 master records per transaction, each one adding
about 40-50 child records in this table, and all of these child records
point to a one-rec-per-transaction table via this indexed field).

All in all it occurs to me that there are a few indexes here and there
that we'd better remove. Would reduce maintenence cost when inserting
records and probably reduce risk of less than optimal query plans in
some cases. That's a good thing to be aware of and I thank you for it!

Regards,
Kjell
--
--------------------------------------
Kjell Rilbe
DataDIA AB
E-post: kjell@...
Telefon: 08-761 06 55
Mobil: 0733-44 24 64