Subject | About index selectivity |
---|---|
Author | Alexandre Benson Smith |
Post date | 2003-01-15T23:24:51Z |
As commented earlier these days...
How can one know what is the ideal selectivity or break-even point to
create/drop an index...
I was looking for some more info about index selectivity.
I read some articles, and understand that:
If I have a "Customer" table with 100.000 rows, and have a field Country
with 20 diferent values, the index selectivity will be 100.000/20 wich is
an average of 5000 records per country.
Am I correct until now ?
If I send a "Select * from Customer where Country = 'Brazil'" I think the
optimizer will not use that index because it have "low selectivity".
But, If 90% of my customers are in Brazil the other 10% are spread around
the other 19 countries.
If I send the above statment I will be retrieving 90.000 rows of 100.000
and if it is a indexed search will be far more expensive than a natural
read so optimizer should discard that index and perform a natural search.
I continue on the right path ?
But if I send "Select * from Customer where country = 'Argentina'" the
indexed read will be good because I have only 10 customers in Argentina.
If the optimizer could know if the specified value has few records go for
an indexed search, if the value has a lot of records, than choose to do a
natural search.
Will be good if an histogram of records with same value is maintained on
the index statistics ?
The histogram should be updated (recomputed) with some frequency (depending
on how the data changes/increases) when a "set statistics" is executed on
that index.
Or is just this way that the indices selectivity are computed ?
Alexandre Benson Smith
Development
THOR Software e Comercial Ltda.
Santo Andre - Sao Paulo - Brazil
www.thorsoftware.com.br
----------
---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.437 / Virus Database: 245 - Release Date: 06/01/2003
[Non-text portions of this message have been removed]
How can one know what is the ideal selectivity or break-even point to
create/drop an index...
I was looking for some more info about index selectivity.
I read some articles, and understand that:
If I have a "Customer" table with 100.000 rows, and have a field Country
with 20 diferent values, the index selectivity will be 100.000/20 wich is
an average of 5000 records per country.
Am I correct until now ?
If I send a "Select * from Customer where Country = 'Brazil'" I think the
optimizer will not use that index because it have "low selectivity".
But, If 90% of my customers are in Brazil the other 10% are spread around
the other 19 countries.
If I send the above statment I will be retrieving 90.000 rows of 100.000
and if it is a indexed search will be far more expensive than a natural
read so optimizer should discard that index and perform a natural search.
I continue on the right path ?
But if I send "Select * from Customer where country = 'Argentina'" the
indexed read will be good because I have only 10 customers in Argentina.
If the optimizer could know if the specified value has few records go for
an indexed search, if the value has a lot of records, than choose to do a
natural search.
Will be good if an histogram of records with same value is maintained on
the index statistics ?
The histogram should be updated (recomputed) with some frequency (depending
on how the data changes/increases) when a "set statistics" is executed on
that index.
Or is just this way that the indices selectivity are computed ?
Alexandre Benson Smith
Development
THOR Software e Comercial Ltda.
Santo Andre - Sao Paulo - Brazil
www.thorsoftware.com.br
----------
---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.437 / Virus Database: 245 - Release Date: 06/01/2003
[Non-text portions of this message have been removed]