Subject Re: Why "SELECT FIRST 1 ..." and "EXISTS ( SELECT * FROM ... )" are so slow in a big database?
Author Ali Gökçen
Ok,
i have an another idea if your application is time critical.

we can create another field with ideal index selectivity.

PV_ID = PV*1000000000+ID -- you may be prefer BIGINT for more zeros
you can fill this field by before update&insert triggers.

create unique index on DV_PV(PV_ID);

then:

DELETE FROM PV_TYPE
WHERE PV = 2 AND
( SELECT min(PV_ID) FROM DV_PV
WHERE PV_ID BETWEEN 2000000000 and 2999999999
) IS NULL

it shouldn't be slow but has some cost.

Regards.
Ali


--- In firebird-support@yahoogroups.com, "Antti Nivala"
<antti.nivala@...> wrote:
>
> Ali Gökçen wrote:
> > DELETE FROM PV_TYPE
> > WHERE PV = 2 AND
> > ( SELECT min(PV) FROM DV_PV WHERE PV = 2 ) IS NULL
>
> Ali, I tried this with my database and unfortunately it does not
run any faster. The plan is different as you said, but nevertheless
it appears to be fetching the same amount of data from the index. The
execution time varies between 350 ms and 550 ms on my computer, which
is pretty much the same as about 400 ms with my original statement.
>
> I guess there is no way to circumvent fetching all matches to PV =
2 from the index as Dmitry said that the bitmap will be built from
the index before going to records at all.
>
> Since 400 ms per delete is too much for our application, we will
use a manual counter as suggested by Lester Caine. We actually
already have such a field in the PV table for other purposes. One
reason why I said that it is not an elegant solution is that our
counter is not exact. It is just an approximation because update
conflicts can prevent us from updating the counter in some cases. We
don't want to retry the whole complex transaction because of that so
we just ignore the failure of updating the counter. So, we cannot
fully rely on the counter. But we can improve the performance a lot
by using the counter to avoid the EXISTS check if the counter is
fairly large (e.g. > 100 or > 1000). In those cases we just conclude
that making a true existence check is not reasonable because it would
take too much time.
>
> Adam asked if I have really measured the performance of using a
NATURAL scan in this case. Yes, that is really too slow. We have
millions of distinct PV values in the table so it easily takes FB to
read a couple of million records before finding a match. Definitely
too slow. So, the index is good there for most cases, but we have a
few values (e.g. PV = 2) which have a large number of duplicates, and
those cases turned out to be slow.
>
> Anyway, using a manual counter for avoiding the slow EXISTS checks
gives good performance. Thanks for all suggestions.
>
> Antti
>