Subject About index selectivity
Author Alexandre Benson Smith
As commented earlier these days...

How can one know what is the ideal selectivity or break-even point to
create/drop an index...

I was looking for some more info about index selectivity.

I read some articles, and understand that:

If I have a "Customer" table with 100.000 rows, and have a field Country
with 20 diferent values, the index selectivity will be 100.000/20 wich is
an average of 5000 records per country.

Am I correct until now ?

If I send a "Select * from Customer where Country = 'Brazil'" I think the
optimizer will not use that index because it have "low selectivity".

But, If 90% of my customers are in Brazil the other 10% are spread around
the other 19 countries.

If I send the above statment I will be retrieving 90.000 rows of 100.000
and if it is a indexed search will be far more expensive than a natural
read so optimizer should discard that index and perform a natural search.

I continue on the right path ?

But if I send "Select * from Customer where country = 'Argentina'" the
indexed read will be good because I have only 10 customers in Argentina.

If the optimizer could know if the specified value has few records go for
an indexed search, if the value has a lot of records, than choose to do a
natural search.

Will be good if an histogram of records with same value is maintained on
the index statistics ?

The histogram should be updated (recomputed) with some frequency (depending
on how the data changes/increases) when a "set statistics" is executed on
that index.

Or is just this way that the indices selectivity are computed ?



Alexandre Benson Smith
Development
THOR Software e Comercial Ltda.
Santo Andre - Sao Paulo - Brazil
www.thorsoftware.com.br

----------


---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.437 / Virus Database: 245 - Release Date: 06/01/2003


[Non-text portions of this message have been removed]