Subject | Re: [firebird-support] LEFT OUTER JOIN speed on a simple query (Svein) |
---|---|
Author | Michael Ludwig |
Post date | 2010-05-15T13:04:40Z |
SoftTech schrieb am 15.05.2010 um 07:06:26 (-0500):
"total dup" and "max dup" to get an approximative idea. The first is the
number of indexed rows in the table, the second is the number of rows
the value of which is not unique in the index (so there are duplicates
for this value), and the third is the maximum number of duplicates.
If "max dup" is 100,000 and you query for that value, the index might
cause more trouble than benefit.
When "max dup" is high compared to "total dup", your index has poor
selectivity.
You could also run the following *expensive* query to get the selectiviy
for an index on B in FS (1 being optimal = primary key or unique, lower
values meaning poorer selectivity):
WITH FSGR( B, OCC )
AS ( SELECT B, COUNT(*) FROM FS GROUP BY B )
SELECT
COUNT(*) AS DISTINCT_ROWS,
SUM(OCC) AS TOTAL_ROWS,
CAST( COUNT(*) AS NUMERIC(4,3) ) / SUM(OCC) AS SELECTIVITY
FROM FSGR;
I think there are at least two things that will spoil selectivity:
(1) excessive numbers of duplicates for one or more values
(2) low number of distinct values in general
I trust others will correct any flaws or mistakes in my reasoning ...
--
Michael Ludwig
> You asked "what is the selectivity of all indexed fields involved inYou run gstat -index on the database and look at the values for "nodes",
> the JOIN and WHERE clauses etc." I use Database Workbench as my
> Firebird development tool. How do I find this information?
"total dup" and "max dup" to get an approximative idea. The first is the
number of indexed rows in the table, the second is the number of rows
the value of which is not unique in the index (so there are duplicates
for this value), and the third is the maximum number of duplicates.
If "max dup" is 100,000 and you query for that value, the index might
cause more trouble than benefit.
When "max dup" is high compared to "total dup", your index has poor
selectivity.
You could also run the following *expensive* query to get the selectiviy
for an index on B in FS (1 being optimal = primary key or unique, lower
values meaning poorer selectivity):
WITH FSGR( B, OCC )
AS ( SELECT B, COUNT(*) FROM FS GROUP BY B )
SELECT
COUNT(*) AS DISTINCT_ROWS,
SUM(OCC) AS TOTAL_ROWS,
CAST( COUNT(*) AS NUMERIC(4,3) ) / SUM(OCC) AS SELECTIVITY
FROM FSGR;
I think there are at least two things that will spoil selectivity:
(1) excessive numbers of duplicates for one or more values
(2) low number of distinct values in general
I trust others will correct any flaws or mistakes in my reasoning ...
--
Michael Ludwig