firebird-support - Re: Index analysis - what does mean nodes, dup and max dup?

Subject	Re: Index analysis - what does mean nodes, dup and max dup?
Author	emb_blaster
Post date	2008-11-22T12:17:04Z

> If your
> performance is OK, then don't worry about packing density.
> ....
> Yes, we generally recommend increasing the page size (not the
> cache size!) when the depth of any index goes above 3.

sorry I misstyped. Wanna type "page size" typed "pages cache". mine
mistake.
Indeed understand!

> > so average data length can be calc by expr "(nodes-totaldup)/nodes"?
>
> No. You get a lower average data length when you have duplicates,
> but there are other factors that compress keys....

oh yes! now that you said I remember read this in your text.
(prefix, sufix, trailing, spaces, etc... remembered I wil read it
again to fix it in my mind ;)

>
> > in this case? an bigger average data length is a "good" index and a
> > smaller a "bad" index that can slow down query?
>
> Having a lot of duplicates makes an index more dense, so you get
> more entries on a page. That's good. Having data that compresses
> well also makes indexes dense. But you should index fields based
> on usage, not compression. If you frequently search for records
> based on values of a particular field, index it. Yes, there are
> edge cases - if the number of duplicates is so large that you're
> going to read every data page anyway, then don't bother with the
> index.
>

ok! that is! I´m trying to understand this edge cases, when an Index
can be bad
and instead of help selects, only slow down inserts and updates.

> > that´s mean the first index is better than this second?
>
> No, indexes are good if they make queries faster, bad otherwise.

so only running a query will I can say it is a bad or good index?
this is an more actual stats. Only viewing it I can say if is good or
bad indexes?
Index SAIDAPRODUTOS_IDX2 (2)
Depth: 3, leaf buckets: 843, nodes: 698840
Average data length: 0.01, total dup: 698609, max dup: 4158
Index SAIDAPRODUTOS_IDX3 (3)
Depth: 3, leaf buckets: 1349, nodes: 698840
Average data length: 0.01, total dup: 697202, max dup: 99598

> > Index SAIDAPRODUTOS_IDX2 (2)
> > Depth: 2, leaf buckets: 9, nodes: 8865
> > Average data length: 0.02, total dup: 8740, max dup: 156
> >
> > this index is in a DATE column. like you can imagine I use it in many
> > many querys. it have only 125 distincts values. one of this values is
> > repeated 156 times.
> > it this a bad index? can I drop this index?
>
> That index is probably OK. When you find more than twenty or thirty
> thousand duplicates of a single value, you should either find a way
> to make the index more selective (e.g. make it a compound index with
> the primary key as the second term) or move to Firebird V2 or greater.
> Duplicates don't matter very much for retrievals, but they really
> slow down garbage collection in older versions.
>

yes currently I´m using Firebird SS 2.0.4 and will continue using
2.0.x new versions ´til jan/2009 while testing FB 2.1.x or a FB 2.5
stable version for upgrade. So I don´t need bother with duplicates?
even if it is hundreds of thousand?

> Best,
>
> Ann
>

Thank you for help again. sorry this topic is taking so long, but I
want understand indexes things.

Best,
Elton