Subject Re: [firebird-support] IN, sub-select and select distinct
Author Arno Brinkman
Hi,

> > > select pres.productid
> > > from sys_idlists L1
> > > inner join presence pres on L1.id = pres.outletid
> > > inner join sys_idlists L2 on pres.status = L2.id
> > > where
> > > L1.groupid = 1 and L2.groupid = 2
> > > and pres.valuedate >= '2003-09-01'
> > > and pres.valuedate < '2003-11-01'
> > >
> > > --> 481ms
> > > --> 590ms with distinct
> > > --> 590ms with group by
> > >
> > > Again, sys_idlists contains only the 9 required records.
> >
> > Also show us the PLANs.
>
> With fetchall, results become :
>
> "Explicit IN"
>
> naked : 541ms
> distinct : 451ms
> group by : 581ms
>
> Plans are :
> PLAN (PRES INDEX (PRESENCE_IDX1,RELATION_262,RELATION_262,
> RELATION_262,RELATION_262,RELATION_262,RELATION_262))

> PLAN SORT ((PRES INDEX (PRESENCE_IDX1,RELATION_262,RELATION_262,
> RELATION_262,RELATION_262,RELATION_262,RELATION_262)))

In fact it is wierd that the "distinct" is faster as the "naked" one, but i
guess that's due the cache or the returning records are a huge different
between "naked" and "distinct" (then would sending/receiving records over
the wire be expensive here).

> PLAN (PRES ORDER RELATION_296)

This explains why a GROUP BY is somewhat slower.

> "JOIN" versions
>
> naked : 922ms
> distinct : 711ms
> group by : 711ms
>
> Plans are :
> PLAN MERGE (SORT (L2 NATURAL),SORT (JOIN (L1 NATURAL,
> PRES INDEX (RELATION_262))))

> PLAN SORT (MERGE (SORT (L2 NATURAL),SORT (JOIN
> (L1 NATURAL,PRES INDEX (RELATION_262)))))

Same "strange" situation as without JOIN.

> PLAN SORT (MERGE (SORT (L2 NATURAL),SORT (JOIN
> (L1 NATURAL,PRES INDEX (RELATION_262)))))

Now you've the same PLAN as the "distinct" and thus the speed is the same.

> > > I can't believe there is no way to achieve the "explicit IN"
performance
> > > with a JOIN, a sub-select or anything else.
> >
> > A JOIN or sub-select needs always disk reads and there's no way to
compare
> > that with hard given constants.
>
> I understand but the sys_idlists table really only contains the ids for my
> IN predicates. So, I thought sub-selects would perform just as well as
hard
> constants.

In fact it should come very close.

> ..PRES.OUTLETID IN (112191, 112193, 112194, 112200, 112201, 112205)
> ..pres.outletid in (select id from sys_idlists where groupid = 1)
>
> But results are worse than with JOINs. I'm above the 16s in each case.

16 seconds difference? What are both times?
How many records does sys_idlists have? Could you provide some more details
about the tables in question. I'm interested in why the difference is so
huge, want i don't understand at the moment. Okay, some difference is what i
expected.

> Actually, my ultimate goal is to have parameterized queries. I could build
> the SQL dynamically, but I also need to apply additional WHERE constraints
> on those queries sometimes. Currently I'm using selectable SPs building
the
> IN predicates from string params and calling EXECUTE STATEMENT.
> Works great, but not very clean...
> That's why I thought to that sys_idlists table filled (and emptied in the
> same tx) with the IDs required for my IN predicates.

Okay, should work fine i think. If you've a index on that table it's also
interesting to clean up the entries by running "SELECT Count(*) FROM
sys_idlists" in a new transaction after you finished the running
transaction.

Regards,
Arno Brinkman
ABVisie

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Firebird open source database (based on IB-OE) with many SQL-99 features :
http://www.firebirdsql.org
http://www.firebirdsql.info
http://www.fingerbird.de/
http://www.comunidade-firebird.org/

Support list for Interbase and Firebird users :
firebird-support@yahoogroups.com

Nederlandse firebird nieuwsgroep :
news://80.126.130.81