Subject Re: [firebird-support] IN operator performance
Author Alexandre Benson Smith
Hi !

Em 22/12/09 20:14, marco bianchini escreveu:
> hi, im so interested about IN clause too,
> id like to ask also why:
>
> A) .. where afield in (select bfield from [some millions row])
>

select * from TableA where Field1 in (select Field2 from TableB)

is internally transformed to

select * from TableA where Field1 exists (select * from TableB where
TableB.Field2 = TableA.Field1)

so, if TableA is big it will be slow, because it needs to touch every
record of TableA.

if TableA is not big and TableB.Field2 is indexed, it should have a good
performance.

The first impression (at least from my point of view) is that the above
query would be splited into 2, something like:

Select Field2 from TableB (store it on a "cache")

select * from TableA were Field1 in (list generated by the the previous
query), but it's NOT the sub-query is executed one time for each record
on TableB.

FB 1.X (IIRC) would use an index for such query, but FB 2.X will NOT use
the index, because of some null handling condition, FB 1.5 is faster for
such query, but could give wrong results in some cases, FB 2.X is
slower, but will always give a correct result.

My sugestion is to use EXISTS instead of IN for such queries, I think
you will have much better performance.


> B) .. where afield in (select distinct bfield .... (some houndreds row in
> resultset, same table of A)
>

Same as above, perhaps a bit slower because of the DISTINCT (sort on the
internal table)

> C) .. where afield in (select<distinct> bfield ....
> A and B/C differs only by primary key/index on bField, aField is always
> indexed, not primary key
> and really looks (at my eyes) so slow.
>

Read above and check-out the plan to see what plan/indices was used.

> i really dont know how to improve it, dont tell me to redesign DB as fb
> support did with others before, i know (or i think to know) how to design
> DB, and i ckecked it so much before posting here: its ok.. and MS-SQL2005,
> with same tables, same data.. is really faster on that query. (ONLY on
> that...)
>

Use EXISTS

The optimizer experts could give some more hints, point corrections to
my answer and perhaps explain better why FB can't perform as well as
other databases in such condition, I remember to read a message from
Dmitry Yemanov that he has some ideas on how to improve it, but it was
not for that moment.

see you !

--
Alexandre Benson Smith
Development
THOR Software e Comercial Ltda
Santo Andre - Sao Paulo - Brazil
www.thorsoftware.com.br