Subject [firebird-support] Re: FB 2.1.1 slow performance with subselect
Author Svein Erling Tysvaer
Hi Ed!

Let's look at the two separate statements:

A)
select x.* from online_order x where x.order_id in
(
select k.order_id from online_order k where k.order_date > 01.01.2009'
and k.customer = 16244 group by 1 having count(*) > 1
)
order by x.order_id

B)
with temp_order(order_id) as
(select distinct k.order_id from online_order k
where k.order_date > 01.01.2009'
and k.customer = 16244
and exists(select null from online_order k2
where k2.order_date > 01.01.2009'
and k2.customer = 16244
and k2.rdb$db_key > k.rdb$db_key))
select x.* from online_order x
join temp_order t on x.order_id = t.order_id
order by x.order_id

These two statements do the same thing, and everything is part of one
statement. There's nothing that is available outside the statements
themselves, not even temp_order.

The main reason a) is dead slow, is that the subselect is executed for
every potential row in online_order x. In this case, that is not
necessary, so I move things out by doing:

C)
with temp_order(order_id) as
(select k.order_id from online_order k where k.order_date > 01.01.2009'
and k.customer = 16244 group by 1 having count(*) > 1)
select x.* from online_order x where x.order_id in
(select t.order_id from temp_order t)
order by x.order_id

Generally, I find JOIN to be superior to IN(<subselect>) (though I
haven't measured whether it matters when using CTE), and this statement
can easily be changed into a join by adding DISTINCT:

D)
with temp_order(order_id) as
(select DISTINCT k.order_id from online_order k where k.order_date >
01.01.2009'
and k.customer = 16244 group by 1 having count(*) > 1)
select x.* from online_order x
JOIN TEMP_ORDER T ON x.order_id = t.order_id
order by x.order_id

Using EXISTS rather than COUNT(*) is just because EXISTS generally seems
quick and stops after the first match, whereas count(*) may have to
count lots of rows - and generally in this case you do not care whether
the count is 2 or 200000. Since EXISTS just checks for existence,
there's at best no point in adding FIRST 1 within the EXISTS, but
hopefully the optimizer recognizes that it is irrelevant if you do add it.

HTH,
Set

eddressel wrote:
> Wanting to brush up on my SQL.... where do I find help on the new
> features in 2.x? Maybe these are not 2.x issues... I'm looking at your
> SQL, and it's well over my head. If there's a better place to answer
> these questions, please let me know.
>
> Specific questions:
>
>> with temp_order(order_id) as
>
> that creates a temp table, correct? It's only available inside the
> current transaction, correct?
>
>> and exists(select null from online_order k2
>> where k2.order_date > 01.01.2009'
>> and k2.customer = 16244
>> and k2.rdb$db_key > k.rdb$db_key))
>
> If I was writing this I would have probably added 'select first 1...':
> would that make sense?
>
>> select x.* from online_order x
>> join temp_order t on x.order_id = t.order_id
>> order by x.order_id
>
> I thought it created a temp order, what is x.*?
>
> Thanks
> Ed Dressel