Subject Re: perf WHERE IN (SUBQUERY)
Author legrand_legrand_63
Hi Svein,
I appreciate a lot your quick and detailled answer!

But I'm quite confused, why
select * from a
where a.id in (select b.id from b where l ='a') ;
can't work like
select * from a
where a.id = (select b.id from b where l ='a')
PLAN (B NATURAL) PLAN (A INDEX (A_ID))
?
In the second exemple Firebird first read (once) B and access A using
A_ID index with the corresponding data.

In fact I'm posting that question because this IN statement was
originaly done in Oracle without any performance problem, Oracle First
read (once) the subquery on B and reads A using A_ID index ...

My table A is quite big, and performances are not equivalent to Oracle
... I would have preferred to hear that there is an optimizer feature
request planned in spite of rewriting the IN statements ;o)

Regards
PAscal

--- In firebird-support@yahoogroups.com, Svein Erling Tysvaer
<svein.erling.tysvaer@...> wrote:
>
> legrand_legrand_63 wrote:
> > Here is a simple test case:
> >
> > select * from a
> > where a.id in (select b.id from b where l ='a') ;
> >
> > gives:
> > PLAN (B INDEX (B_ID)) PLAN (A NATURAL)
> >
> > Would it be possible to access A with index A_ID ?
> > like in:
> >
> > select * from a
> > where a.id in (1,2)
> >
> > PLAN (A INDEX (A_ID, A_ID))
> >
> > Regards
> > PAscal
>
> Nope, Pascal, that's impossible. What you're asking Firebird to do, is
> to for each record in A, to select all records in B with a matching L
> and then compare. That means you're asking Firebird to perform the
query
> on B as many times as there are records in A (a very bad idea on
largish
> tables). Now, from Firebird 1.5 onwards (I think), Firebird isn't going
> to honour your request, rather, Arno decided to change such a request
> into the equivalent (and potentially much quicker)
>
> select * from a
> where exists(select * from b where b.id = a.id and b.l = 'a')
>
> Still, the query has to check every if there is a matching B for every
> record in A, and if it has to go through the entire table, there is no
> quicker way than to go NATURAL.
>
> A way to improve your query, could be
>
> select a.* from a
> join b on a.id = b.id
> where b.l = 'a'
>
> which could give you the following plan (I'm a bit uncertain about the
> syntax):
>
> PLAN JOIN (B INDEX (B_L), A INDEX(A_ID))
>
> For this to return the same set as your original query, B.ID must be
> unique. If not, you have to add DISTINCT (which in itself may be
> incorrect if there are supposed to be duplicate rows in A) or a further
> NOT EXISTS clause (this may be tailored to return the exact same result
> regardless of duplicates).
>
> I don't know whether it would be possible to improve Firebirds ways of
> handling such requests. Obviously, the example above is very simple and
> it should be possible to make the optimizer perform better (see how I
> think it theoretically could be solved below), but it doesn't take much
> additional complication to at least confuse my brain. And I'm not able
> to imagine neither the potential problems or benefits in making
Firebird
> do similar conversions.
>
> HTH,
> Set
>
> (here's below)
>
> select a.* from a
> join b on a.id = b.id
> where b.l = 'a'
> and not exists(select * from b2
> where b.id = b2.id
> and b2.l = 'a'
> and b2.rdb$db_key < b.rdb$db_key)
>
> (I'm uncertain about how to use rdb$db_key, replace with primary key if
> available. Also, I think such a request might actually turn out slower
> than the original select in some circumstances)
>