Subject Re: [firebird-support] Subselect causes slow query
Author Helen Borrie
At 01:31 PM 9/09/2005 -0400, you wrote:
>The times listed are what it takes to run each query separately. If I
>run it as one query, I kill it after several minutes.
>The plan is natural for all querys. The only thing I can think of is
>that FB is evaluating the IN statement for every row.

You got it. That's exactly what happens with a correlated subquery. They
are great for small sets and a dog for large ones. For every value of NDC
in the first cursor, the engine has to form a set and count it.

>-- 0.781s
>select
>
>di2.ID,di2.NDC,di2.FRMTD_NDC,di2.UPC_HRI,di2.FRMTD_MDDB_UPC_HRI,di2.GPI,
>di2.STATUS,di2.NAME,di2.LABELNAME
>from
> MDDB_DRUGITEM di2
>where
> di2.NDC in
>(
>-- 0.561s
>select
> d1.ndc
>from
> mddb_drugitem d1
>where
> d1.upc_hri is not null and d1.ndc is not null
>group by
> d1.ndc
>having
> count(*) > 1
>)

However, the overriding problem here is the logic of the subquery. You are
correlating the main query with the subquery via a nullable key AND
predicating the search on having more than one occurrence of that key that
is not null. Null == Not Null is logically absurd. I think the engine is
going to take several decades to try to satisfy you on rows where a null is
being passed to the subquery as the correlation key.

Use search criteria on the outer query to ensure that the subquery only
gets to process records where the correlation key is not null and both of
the search keys are not null. Then, all the subquery has to do is count
the records that match:

select
di2.ID,
di2.NDC,
di2.FRMTD_NDC,
di2.UPC_HRI,
....
from
MDDB_DRUGITEM di2
where
di2.NDC is not null
and di2.UPC_HRI is not null /* eliminates all rows where one or the
other is null */
and exists (
select
d1.ndc
from mddb_drugitem d1
where d1.ndc = di2.NDC
group by 1
having count(d1.ndc) > 1)

If this is too slow, consider writing an elegant selectable SP to avoid the
double-dipping entirely.

./heLen