Subject | Re: [firebird-support] Optimizer not very good at IN/EXISTS |
---|---|
Author | Svein Erling Tysvaer |
Post date | 2006-08-08T14:15:53Z |
Hi Mathias, I'll comment your queries within the text as they appear.
Mathias Dellaert wrote:
... for any table in the plan ('table' meaning 'alias'). I'm far from
certain that it doesn't use the S2.SID index under the hood. In general,
seeing any GROUP BY within a subselect ought to make you see if the
query can be rewritten one way or the other (I don't think I've never
had a GROUP BY in a subselect).
opinion) would be
SELECT * FROM SALES S
WHERE NOT EXISTS(SELECT * FROM SALES S2
WHERE S2.PRODUCT = S.PRODUCT
AND S2.SID < S.SID)
generate the subselect for every potential row (at least I think
Firebird has to do the subselect in Firebird 1.5 as many times as there
are rows in the SALES table).
could be able to figure out that it in this query simply could eliminate
the subselect altogether, but in theory the subselect could refer to a
field in the outer select in its WHERE clause, so it is understandable
that S2 has to be executed for each row in S. Since it at this point has
the PK available, adding the product index would only slow things down.
The one thing that I don't think you quite grasp, is that a subselect is
executed as many times as there are rows in the 'outer' select (well, at
least it used to do that). Say your SALES table has 10000 rows. The
subselect is then executed 10000 times, and if you have 100 products,
that would means 10000000 'subresults'.
improvement, but the queries you supplied is obscure enough not to not
make it to my wish list (if I'd had any).
Set
Mathias Dellaert wrote:
> Greetings,One deficit with Firebird 1.5 is that if never showed ORDER ... INDEX
>
> I've noticed that the optimizer seems to be very inefficient when
> working with IN/EXISTS clauses. Here's an example
>
> SELECT * FROM SALES S
> WHERE SID IN (SELECT MIN(SID)
> FROM SALES S2
> GROUP BY Product)
>
> Uses plan:
>
> PLAN (S2 ORDER IDX_SALES_BASE)
> PLAN (S NATURAL)
... for any table in the plan ('table' meaning 'alias'). I'm far from
certain that it doesn't use the S2.SID index under the hood. In general,
seeing any GROUP BY within a subselect ought to make you see if the
query can be rewritten one way or the other (I don't think I've never
had a GROUP BY in a subselect).
> (IDX_SALES_BASE is an index over the product field, SID is the primaryYup, that should be the same query. A far better way to do this (in my
> key)
>
> Similarly, using an EXISTS:
>
> select * from SALES S
> WHERE EXISTS (SELECT Product
> FROM SALES S2
> GROUP BY Product
> HAVING MIN(S2.SID) = S.SID )
>
> Uses the same plan.
opinion) would be
SELECT * FROM SALES S
WHERE NOT EXISTS(SELECT * FROM SALES S2
WHERE S2.PRODUCT = S.PRODUCT
AND S2.SID < S.SID)
> Where-as the "equivalent" query (in Firebird 2.0 RC2)Here Firebird is free to evaluate s2 before s, hence it doesn't have to
>
> select Sales.* from
> SALES,
> (SELECT MIN(SID) as SID
> FROM SALES
> GROUP BY product) as Blah
> WHERE Sales.SID = Blah.SID
>
> Uses plan:
>
> PLAN JOIN (BLAH SALES ORDER IDX_SALES_BASE, SALES INDEX (PK_SALES))
>
> Note how this one actually uses the primary key.
generate the subselect for every potential row (at least I think
Firebird has to do the subselect in Firebird 1.5 as many times as there
are rows in the SALES table).
> Another note: simpler queries generate similar plans, like (and yes INo, that's in my opinion a decent choice of indexes. An ideal optimizer
> realise this is a silly query)
>
> SELECT * FROM SALES S
> WHERE SID IN (SELECT SID
> FROM SALES S2
> WHERE product='PC')
>
> Which uses
>
> PLAN (S2 INDEX (PK_SALES))
> PLAN (S NATURAL)
>
> Which seems even less efficient to me (since it doesn't even use the
> product index).
could be able to figure out that it in this query simply could eliminate
the subselect altogether, but in theory the subselect could refer to a
field in the outer select in its WHERE clause, so it is understandable
that S2 has to be executed for each row in S. Since it at this point has
the PK available, adding the product index would only slow things down.
The one thing that I don't think you quite grasp, is that a subselect is
executed as many times as there are rows in the 'outer' select (well, at
least it used to do that). Say your SALES table has 10000 rows. The
subselect is then executed 10000 times, and if you have 100 products,
that would means 10000000 'subresults'.
> For comparison:No, it is not a bug. It wouldn't surprise me if there still is room for
>
> SELECT SID FROM SALES S2 WHERE product = 'PC'
>
> Uses
>
> PLAN (S2 INDEX (IDX_SALES_BASE))
>
> Now my question: are there any plans to improve the behaviour of the
> IN/EXISTS clause? Is this a bug? It seems to me that now Firebird 2 has
> greater subquery support, there's plenty of room for improvement here.
improvement, but the queries you supplied is obscure enough not to not
make it to my wish list (if I'd had any).
Set