firebird-support - Re: Why "SELECT FIRST 1 ..." and "EXISTS ( SELECT * FROM ... )" are so slow in a big database?

Subject	Re: Why "SELECT FIRST 1 ..." and "EXISTS ( SELECT * FROM ... )" are so slow in a big database?
Author	Antti Nivala
Post date	2006-07-16T08:41:56Z

Ali Gökçen wrote:

> DELETE FROM PV_TYPE
> WHERE PV = 2 AND
> ( SELECT min(PV) FROM DV_PV WHERE PV = 2 ) IS NULL

Ali, I tried this with my database and unfortunately it does not run any faster. The plan is different as you said, but nevertheless it appears to be fetching the same amount of data from the index. The execution time varies between 350 ms and 550 ms on my computer, which is pretty much the same as about 400 ms with my original statement.

I guess there is no way to circumvent fetching all matches to PV = 2 from the index as Dmitry said that the bitmap will be built from the index before going to records at all.

Since 400 ms per delete is too much for our application, we will use a manual counter as suggested by Lester Caine. We actually already have such a field in the PV table for other purposes. One reason why I said that it is not an elegant solution is that our counter is not exact. It is just an approximation because update conflicts can prevent us from updating the counter in some cases. We don't want to retry the whole complex transaction because of that so we just ignore the failure of updating the counter. So, we cannot fully rely on the counter. But we can improve the performance a lot by using the counter to avoid the EXISTS check if the counter is fairly large (e.g. > 100 or > 1000). In those cases we just conclude that making a true existence check is not reasonable because it would take too much time.

Adam asked if I have really measured the performance of using a NATURAL scan in this case. Yes, that is really too slow. We have millions of distinct PV values in the table so it easily takes FB to read a couple of million records before finding a match. Definitely too slow. So, the index is good there for most cases, but we have a few values (e.g. PV = 2) which have a large number of duplicates, and those cases turned out to be slow.

Anyway, using a manual counter for avoiding the slow EXISTS checks gives good performance. Thanks for all suggestions.

Antti