Subject Re: [firebird-support] IN, sub-select and select distinct
Author Jerome Bouvattier
Hi,

> > What surprises me, is that I cannot seem to find something at least as
> fast
> > as the "explicit IN" version. And I just noticed that if you have two IN
> > predicates then the "explicit IN" is always faster, w/o distinct.
> >
> > SELECT PRES.PRODUCTID
> > FROM PRESENCE PRES
> > WHERE
> > PRES.OUTLETID IN (112191, 112193, 112194, 112200, 112201, 112205)
> > and PRES.VALUEDATE >= '2003-09-01'
> > and PRES.VALUEDATE < '2003-11-01'
> > and PRES.STATUS IN (1,2,3)
> >
> > --> 50ms
> > --> 330ms with distinct
> > --> 80ms with Group by !!!
>
> Are you doing a fetchall when measuring? The difference between distinct
and
> GROUP BY is not clear to me here.

You're right, I forgot to turn FetchAll on. Then, perf. difference are not
that dramatic, but still exists.
See below...

> > select pres.productid
> > from sys_idlists L1
> > inner join presence pres on L1.id = pres.outletid
> > inner join sys_idlists L2 on pres.status = L2.id
> > where
> > L1.groupid = 1 and L2.groupid = 2
> > and pres.valuedate >= '2003-09-01'
> > and pres.valuedate < '2003-11-01'
> >
> > --> 481ms
> > --> 590ms with distinct
> > --> 590ms with group by
> >
> > Again, sys_idlists contains only the 9 required records.
>
> Also show us the PLANs.

With fetchall, results become :

"Explicit IN"

naked : 541ms
distinct : 451ms
group by : 581ms

Plans are :
PLAN (PRES INDEX (PRESENCE_IDX1,RELATION_262,RELATION_262,
RELATION_262,RELATION_262,RELATION_262,RELATION_262))
PLAN SORT ((PRES INDEX (PRESENCE_IDX1,RELATION_262,RELATION_262,
RELATION_262,RELATION_262,RELATION_262,RELATION_262)))
PLAN (PRES ORDER RELATION_296)

"JOIN" versions

naked : 922ms
distinct : 711ms
group by : 711ms

Plans are :
PLAN MERGE (SORT (L2 NATURAL),SORT (JOIN (L1 NATURAL,
PRES INDEX (RELATION_262))))
PLAN SORT (MERGE (SORT (L2 NATURAL),SORT (JOIN
(L1 NATURAL,PRES INDEX (RELATION_262)))))
PLAN SORT (MERGE (SORT (L2 NATURAL),SORT (JOIN
(L1 NATURAL,PRES INDEX (RELATION_262)))))

> > I can't believe there is no way to achieve the "explicit IN" performance
> > with a JOIN, a sub-select or anything else.
>
> A JOIN or sub-select needs always disk reads and there's no way to compare
> that with hard given constants.

I understand but the sys_idlists table really only contains the ids for my
IN predicates. So, I thought sub-selects would perform just as well as hard
constants.

..PRES.OUTLETID IN (112191, 112193, 112194, 112200, 112201, 112205)
..pres.outletid in (select id from sys_idlists where groupid = 1)

But results are worse than with JOINs. I'm above the 16s in each case.

Actually, my ultimate goal is to have parameterized queries. I could build
the SQL dynamically, but I also need to apply additional WHERE constraints
on those queries sometimes. Currently I'm using selectable SPs building the
IN predicates from string params and calling EXECUTE STATEMENT.
Works great, but not very clean...
That's why I thought to that sys_idlists table filled (and emptied in the
same tx) with the IDs required for my IN predicates.

Maybe there is another solution you can think of ?

Thanks for your help.

--
Jerome