firebird-support - Re: [firebird-support] Re: Index analysis - what does mean nodes, dup and max dup?

Subject	Re: [firebird-support] Re: Index analysis - what does mean nodes, dup and max dup?
Author	Ann W. Harrison
Post date	2008-11-21T18:56:24Z

emb_blaster wrote:

>>
>>> Data pages: 178, data page slots: 178, average fill: 80%
>
> If I understand, if average is less (for example 20%) that mean a
> table using many pages not filled with data (for example a huge delete
> after a huge insert). In that case I can "correct" this runing a
> backup'n'restore (what can act similar a "defrag in Window") or simple
> let it be filling with dayly inserts (what can take some time in some
> cases...). This mean this can or not cause slow down in DB. But
> normaly will not. Right?

As usual, it depends. If you're fetching one record by primary key,
you don't care much about the packing density. If you're fetching
bunches of records, the more you get with each page read, the better.
gstat is normally used to diagnose performance problems. If your
performance is OK, then don't worry about packing density.

>
>
>> The first index is the system generated index that maintains the
>> primary key. It is index 0 for this table.
>
> index 0 is first index created or mean that Select´s will try this
> index first? or maybe both?

It's the first index created. The optimizer chooses indexes based
on the conjuncts in your ON and WHERE clauses - best match between
index keys and conjuncts.

>
>>> Depth: 2, leaf buckets: 11, nodes: 8865
>
> so an index with Depth growing is bad. if I had a index in a table
> with deph >= 5, I MUST do anything (like chage DB Pages Cache).

Yes, we generally recommend increasing the page size (not the
cache size!) when the depth of any index goes above 3. Let me
repeat, if the depth of any index is greater than 3, backup your
database and restore it with a larger page size.

>
> ins´t 2599 distincts values (8865-6266=2599)?

Yes of course - I did say that I doubted my subtraction. Reading
comprehension isn't very good either.

> so average data length can be calc by expr "(nodes-totaldup)/nodes"?

No. You get a lower average data length when you have duplicates,
but there are other factors that compress keys. One is suffix
compression which just means that for an segment of a key that
has trailing spaces (for text) or zeros (for numbers) the trailing
characters are removed. Numeric index keys are represented as
floating point numbers, so the trailing digits are not significant.
The other is prefix compression (which works best on duplicates)
which removes all leading bytes that are the same as the previous
key. For example, the key values AAAA, AAAB, AABC, AABD would
compress to AAAA, 3B, 2BC, 3D. A factor working against compression
is that the first key on each page is not compressed (bigger pages
are better).

> in this case? an bigger average data length is a "good" index and a
> smaller a "bad" index that can slow down query?

Having a lot of duplicates makes an index more dense, so you get
more entries on a page. That's good. Having data that compresses
well also makes indexes dense. But you should index fields based
on usage, not compression. If you frequently search for records
based on values of a particular field, index it. Yes, there are
edge cases - if the number of duplicates is so large that you're
going to read every data page anyway, then don't bother with the
index.

> that´s mean the first index is better than this second?

No, indexes are good if they make queries faster, bad otherwise.

>
>
> take too, for example the third index:
>
> Index SAIDAPRODUTOS_IDX2 (2)
> Depth: 2, leaf buckets: 9, nodes: 8865
> Average data length: 0.02, total dup: 8740, max dup: 156
>
> this index is in a DATE column. like you can imagine I use it in many
> many querys. it have only 125 distincts values. one of this values is
> repeated 156 times.
> it this a bad index? can I drop this index?
>
> This table is a with few inserts only for example, but understanding
> this I wish I can do this in any table in any FDB.

That index is probably OK. When you find more than twenty or thirty
thousand duplicates of a single value, you should either find a way
to make the index more selective (e.g. make it a compound index with
the primary key as the second term) or move to Firebird V2 or greater.
Duplicates don't matter very much for retrievals, but they really
slow down garbage collection in older versions.

Best,

Ann