Subject Re: [IB-Architect] Index Sync/Rebuild Question
Author Paul Reeves
Bill Karwin wrote:
> Paul Reeves wrote:
> > I have often wondered why the engine doesn't automatically recompute
> > these values.
> Probably because the computational cost to recompute the value is
> significant enough that you generally don't want it recalculating too
> often.
> Also, I think it'd take a pretty drastic data change to alter the
> selectivity by a large amount.

Except for one very common situation. Create your database. Populate it with
data. Start using it. Then check the selectivity. You will find that almost
every index will have 0 - the selectivity that existed when the db was created.
Of course a backup and restore will fix this.

> I believe the selectivity is the ratio
> of unique values in the index to total number of records indexed,
> right?

No. It is the reciprocal of the number of unique values. ( 1/n ).

> This ratio unlikely to change, even if you delete or import
> large amounts of data in the table.

The value depends on the uniqueness when the index is created, usually. An
indexed column of 5 rows and four unique values will have a selectivity of 0.25.
A indexed column of 10 rows and four unique values will still have a selectivity
of 0.25. If you add twenty rows and the uniqueness becomes, say, 16, the
selectivity should become 0.0625. Except that the value stored in rdb$indices
will still be 0.25 as the selectivity is not updated.

The idea is that the smaller the number the quicker the data retrieval (fewer
rows/pages to check for the matching conditions)

How much significance all this has depends on the question 'How much credence
does the optimiser give to the selectivity as stored in rdb$indices.' If you
look at the output of gstat it is evident that the index stores a close
approximation of the actual number of unique values ( nodes - total_dup ).
Therefore it can ignore the rdb$statisitics column entirely.

My guess (hope) is that the optimiser will start with the rdb$statistics column
and revise its plan upon actually looking at the indexes chosen for the plan if
there is a wild divergence. Assuming, that is, that looking up rdb$statistics is
quicker than looking at the cached index structures.

It seems to me that if rdb$statistics is important then it should be maintained,
especially under super server. What else are background, low priority threads


Paul Reeves
Fleet River Software