Subject Re: Performance with indices
Author Adam
--- In firebird-support@yahoogroups.com, Pavel Menshchikov
<developer@...> wrote:
>
> Hello Adam,
>
> A> Selectivity is a measure of index duplicates. A selectivity of 5
> A> means that for every value in the index, there are (on average)
5
> A> records that contain this value. The lower the selectivity, the
more
> A> useful it is. A selectivity of 1 indicates a perfect one value
to one
> A> record match. Primary keys by definition require this.
Therefore,
> A> every primary key index and 'unique' constraint index will
always
> A> have a selectivity of 1, regardless of the number of records in
the
> A> table.
>
> But it seems that FB works with indices statictics, which, I guess,
> equals to a selectivity devided by a number of records in the table.

Firebird uses the selectivity in the optimiser. The optimiser will
estimate the cost of a particular potential path based on the
selectivity, and choose the path which looks the cheapest. Of course
it is a lot more complex than that, but the reason we even care about
that statistic is because it influences what query plan the optimiser
comes up with, and if your statistics are wrong you are highly likely
to get a sub-optimal plan.

Selectivity measures duplicates. The more unique an index is, the
more data pages will be known to contain no data of interest.

For example, if you had a gender table and a person table with a
roughly even distribution of men and women, then the FK index would
not be useful. In fact, it would be cheaper to read the data in
storage order and skip the gender you are not interested in than to
read the index, create a bitmap and realise you still need to read
99% of the data pages anyway.

So if my math is right, selectivity equals:

Total Number of Records in Table / Distinct values in indexed field

> And if the number of records changes, indices statistics should be
> recomputed too, shouldn't it?

For Foreign keys and non unique indices, yes, changing the number of
records does indeed impact the selectivity.

But what is a primary key?

"A column in a table whose values uniquely identify the rows in the
table. A primary key value cannot be NULL"

There are no duplicates in a primary key index. If there were, then
it could not uniquely identify the row and would not be a candidate
key let alone the primary key.

If the table had 10 records in it

PK Selectivity = 10 / 10 = 1

If the table had 500 records in it

PK Selectivity = 500 / 500 = 1;

As you can see, because of the unique constraint on the indexed field
(s), the selectivity of a PK will always be 1, no matter how many
records you insert or delete.

Of course, delete 90% of the records in a table and any FK index may
have a totally different selectivity, and may no longer be useful for
a particular query. If the statistics are not updated, then the FK
index may be used with negative impacts.

> And IIRC, statistics recomputing is a
> semi-automatic process in FB...

Not sure about this. I have seen some pretty out of date numbers that
make me suspect otherwise.


Adam