Subject Re: [firebird-support] Query too slow
Author Svein Erling Tysvær
>Hi;
>
>I am using FB 2.5.3. in a fast environment (win8 64bit - QuadCore 3.4 Ghz - 8 GB ram).
>
>If I run this query it goes very very fast and returns 84 records:
>
>Preparing query: select distinct "PROC" from "GEST" where ("GEST"."FSAL" between '20080801' and '20080812') order by "PROC"
>Prepare time: 0.007s
>Field #01: GEST.PROC Alias:PROC Type:STRING(10)
>PLAN SORT ((GEST NATURAL))
>
>But if I run this one, it will return 84 records but it needs 10 minutes !!!:
>Preparing query: select "PROC"."PROC" from "PROC" where ("PROC"."PROC" in (
>select distinct "PROC" from "GEST" where ("GEST"."FSAL" between '20080801' and '20080812'))) order by "PROC"
>Prepare time: 0.003s
>Field #01: PROC.PROC Alias:PROC Type:STR ING(10)
>PLAN SORT ((GEST NATURAL))
>PLAN (PROC ORDER PROC_PK)
>
>Executing...
>Done.
>-1258819480 fetches, 0 marks, 28351438 reads, 0 writes.
>0 inserts, 0 updates, 0 deletes, 11439 index, 962254304 seq.
>Delta memory: 25308 bytes.
>Total execution time: 0:10:53 (hh:mm:ss)
>Script execution finished.
>
>PROC is a table with 11400 rows. Even if the query optimizer takes the worse path (analizing one by one) it seems that 10 minutes is too much ...
>In my opinion, both may run very quick. Why is the second one taking that long?.

Well, your query does tell Firebird to find the DISTINCT possible values of PROC in a subset of GEST 11400 times. Sure, an ideal optimizer would have been able to see that your subselect wasn't correlated (not referencing the outer select) and that you didn't really bother about whether the subselect returned distinct records or not. Newer versions of Firebird do try to transform IN(<SUBSELECT>) into EXISTS(SELECT...), but I guess that your DISTINCT makes it give up. I think someone once told (probably Ann) that one of the early InterBase betas (or maybe it was even before that) had an optimizer that always chose the quickest way to return a result set. It's only problem was that it could use days to find which way was the quickest!

There are several fair ways to reach the result you want:

1) The standard way to write your type of query
select p.PROC from PROC p
where exists(select * from GEST g where g.PROC = p.PROC and g.FSAL between '20080801' and '20080812')
order by PROC

2) Quicker than option 1, but the result could differ a bit and you may want to add DISTINCT
select p.PROC
from PROC p
join GEST g on p.PROC = g.PROC
where g.FSAL between '20080801' and '20080812'

3) Logically similar to your original query in that you treat things like two statements and put them together. A nice way to do things if you construct your SQL dynamically.
with tmp(PROC) as
(select distinct PROC from GEST where (FSAL between '20080801' and '20080812')
select p.PROC from PROC p
join tmp g on p.PROC = t.PROC

Myself, I sometimes use IN (<constant values>) (I even used it earlier today), but I stopped using IN(<subselect>) after 1 or 2 attempts 15 years ago. I've never missed it since.

By the way, your original query can benefit from an index for GEST.FSAL, but cannot use an index for PROC.PROC. My second and third suggestion above could use indexes for GEST.FSAL and PROC.PROC, whereas my first suggestion only could use indexes for GEST.FSAL and GEST.PROC.

HTH,
Set