Subject | The IN(select...) predicate seems slow |
---|---|
Author | Greg At ACD |
Post date | 2005-01-04T17:19:26Z |
Platform: WinXP SP2
DB Version: Firebird 1.5.1 Superserver
I am having some interesting results with the IN () clause,
particularly if there's a subquery within the IN clause.
For example, I have an ASSET table with 100,000 records in it. The
column ASSETID is a primary key.
The following query:
SELECT ASSET.ASSETID
FROM ASSET
WHERE ASSET.ASSETID in (36);
This returns very quickly. Not surprisingly, the resulting plan is
as follows:
PLAN (ASSET INDEX (PK_ASSET))
However, if I rework the query as follows:
SELECT ASSET.ASSETID
FROM ASSET
WHERE ASSET.ASSETID in
(select 36 from rdb$database);
The query is MUCH slower, and the resulting plan is as follows:
PLAN (ASSET NATURAL)
PLAN (RDB$DATABASE NATURAL)
So, it appears that the query is is doing a table scan for ASSETID =
36 instead of using the PK index.
Note that the query is purposely simplified to show the issue I'm
having. In practice, the query would be something to this affect:
SELECT ASSET.ASSETID, ASSET.NAME, ASSET.FOLDERNAME
FROM ASSET
WHERE ASSET.ASSETID
IN (SELECT DISTINCT JOINCATEGORYASSET.ASSETID
FROM JOINCATEGORYASSET
WHERE JOINCATEGORYASSET.CATEGORYID IN (14,15));
So, in this case, the query is exceedingly slow since the subquery
returns 2000 items, and it appears that there's a table scan of the
asset table happening for each item (the query takes about 20
seconds to return).
I could remove the subquery altogether and just join everything up
as well:
SELECT distinct ASSET.ASSETID, ASSET.NAME, ASSET.FOLDERNAME
FROM ASSET, JOINCATEGORYASSET
WHERE ASSET.ASSETID = JOINCATEGORYASSET.ASSETID,
and JOINCATEGORYASSET.CATEGORYID IN (14,15);
...but the DISTINCT portion of the query would make this a little
questionable...
Is there a better way of dealing with this type of query?
Thx!
Greg
DB Version: Firebird 1.5.1 Superserver
I am having some interesting results with the IN () clause,
particularly if there's a subquery within the IN clause.
For example, I have an ASSET table with 100,000 records in it. The
column ASSETID is a primary key.
The following query:
SELECT ASSET.ASSETID
FROM ASSET
WHERE ASSET.ASSETID in (36);
This returns very quickly. Not surprisingly, the resulting plan is
as follows:
PLAN (ASSET INDEX (PK_ASSET))
However, if I rework the query as follows:
SELECT ASSET.ASSETID
FROM ASSET
WHERE ASSET.ASSETID in
(select 36 from rdb$database);
The query is MUCH slower, and the resulting plan is as follows:
PLAN (ASSET NATURAL)
PLAN (RDB$DATABASE NATURAL)
So, it appears that the query is is doing a table scan for ASSETID =
36 instead of using the PK index.
Note that the query is purposely simplified to show the issue I'm
having. In practice, the query would be something to this affect:
SELECT ASSET.ASSETID, ASSET.NAME, ASSET.FOLDERNAME
FROM ASSET
WHERE ASSET.ASSETID
IN (SELECT DISTINCT JOINCATEGORYASSET.ASSETID
FROM JOINCATEGORYASSET
WHERE JOINCATEGORYASSET.CATEGORYID IN (14,15));
So, in this case, the query is exceedingly slow since the subquery
returns 2000 items, and it appears that there's a table scan of the
asset table happening for each item (the query takes about 20
seconds to return).
I could remove the subquery altogether and just join everything up
as well:
SELECT distinct ASSET.ASSETID, ASSET.NAME, ASSET.FOLDERNAME
FROM ASSET, JOINCATEGORYASSET
WHERE ASSET.ASSETID = JOINCATEGORYASSET.ASSETID,
and JOINCATEGORYASSET.CATEGORYID IN (14,15);
...but the DISTINCT portion of the query would make this a little
questionable...
Is there a better way of dealing with this type of query?
Thx!
Greg