Subject | Re: performance expectations and interpreting statistics |
---|---|
Author | Adam |
Post date | 2005-06-09T10:19:48Z |
Hello Bernard,
This sort of comment appears about twice a week, so I am sure by now
it is on a FAQ somewhere, but please take a moment to look at what a
select count(*) has to do.
The engine must do a complete table scan, checking whether each record
is visible to the current transaction. It can not simply "count the
index nodes" because the duplicates are not included twice in the
index, and there is no way from looking at the index node to work out
whether it is visible to the current transaction without taking a trip
to the data pages. The record count is potentially a unique number for
every transaction as it depends basically on:
1. The initial count
2. + Any records you have added to it
3 - Any records you have deleted from it.
It is therefore impossible to store the current table count in some
sort of global variable.
Of course in design there are always trade offs. For example, a few
comments recently spoke about how it *could* insert transaction
information into the index itself. The tradeoff however that was
pointed out that this would likely increase (by over double) the index
size. This would lead to worse performance in pretty much any other
area. In addition, there would be additional overheads in maintaining
this index structure, more headaches for garbage collection and alike,
so in all a bit of a dead end.
There is a neat trick to doing a select count(*) in an efficient
manner if it is required. There is a detailed posting of it if you
search the forum:
1. Create a counter table with a single field to hold your count
2. Create before insert trigger to insert +1 in the counter table
3. Create delete insert trigger to insert -1 in the counter table
4. Create a stored procedure to sum the values in the counter table,
5. delete all the records in the counter table
6. insert the sum (from 4) into the counter table
Now execute that script in some sort of schedule or CRON at a frequent
interval (daily / hourly / etc). Whenever you need the count, you just
need to select the sum of that counter table.
To your other question,
If the query is only 15% of the time, then the very best optimisation
will only make it 15% faster. Now if it takes 2 hours to do, then yes
that is significant (nearly 20 minutes), but if it is only a few
seconds you really need to ask yourself if the time would be better
spent looking at the other 85% for some improvement.
Hope that helps
Adam
This sort of comment appears about twice a week, so I am sure by now
it is on a FAQ somewhere, but please take a moment to look at what a
select count(*) has to do.
The engine must do a complete table scan, checking whether each record
is visible to the current transaction. It can not simply "count the
index nodes" because the duplicates are not included twice in the
index, and there is no way from looking at the index node to work out
whether it is visible to the current transaction without taking a trip
to the data pages. The record count is potentially a unique number for
every transaction as it depends basically on:
1. The initial count
2. + Any records you have added to it
3 - Any records you have deleted from it.
It is therefore impossible to store the current table count in some
sort of global variable.
Of course in design there are always trade offs. For example, a few
comments recently spoke about how it *could* insert transaction
information into the index itself. The tradeoff however that was
pointed out that this would likely increase (by over double) the index
size. This would lead to worse performance in pretty much any other
area. In addition, there would be additional overheads in maintaining
this index structure, more headaches for garbage collection and alike,
so in all a bit of a dead end.
There is a neat trick to doing a select count(*) in an efficient
manner if it is required. There is a detailed posting of it if you
search the forum:
1. Create a counter table with a single field to hold your count
2. Create before insert trigger to insert +1 in the counter table
3. Create delete insert trigger to insert -1 in the counter table
4. Create a stored procedure to sum the values in the counter table,
5. delete all the records in the counter table
6. insert the sum (from 4) into the counter table
Now execute that script in some sort of schedule or CRON at a frequent
interval (daily / hourly / etc). Whenever you need the count, you just
need to select the sum of that counter table.
To your other question,
If the query is only 15% of the time, then the very best optimisation
will only make it 15% faster. Now if it takes 2 hours to do, then yes
that is significant (nearly 20 minutes), but if it is only a few
seconds you really need to ask yourself if the time would be better
spent looking at the other 85% for some improvement.
Hope that helps
Adam