Subject Re: The IN(select...) predicate seems slow
Author Adam
Hi Greg,

I haven't taken a good look at the query, but from what I can see you
would probably be better off using a join. Something like the
following may be a good place to start.

SELECT DISTINCT ASSET.ASSETID, ASSET.NAME, ASSET.FOLDERNAME
FROM JOINCATEGORYASSET
LEFT JOIN ASSET
ON ASSET.ASSETID = JOINCATEGORYASSET.ASSETID
WHERE JOINCATEGORYASSET.CATEGORYID IN (14,15));

I am not sure which fields are indexed, but if
JOINCATEGORYASSET.CATEGORYID is indexed, this will be fast. Check the
plan and optimise it for your needs. Remember to check what happens
if there are any records in JOINCATEGORYASSET but not ASSET. You may
have some nulls you need to deal with (easy as adding changing the
join to an inner join, or adding AND ASSET.ASSETID IS NOT NULL)

Adam


--- In firebird-support@yahoogroups.com, "Greg At ACD"
<GregAtACD@h...> wrote:
>
>
> Platform: WinXP SP2
> DB Version: Firebird 1.5.1 Superserver
>
> I am having some interesting results with the IN () clause,
> particularly if there's a subquery within the IN clause.
>
> For example, I have an ASSET table with 100,000 records in it. The
> column ASSETID is a primary key.
>
> The following query:
>
> SELECT ASSET.ASSETID
> FROM ASSET
> WHERE ASSET.ASSETID in (36);
>
> This returns very quickly. Not surprisingly, the resulting plan is
> as follows:
> PLAN (ASSET INDEX (PK_ASSET))
>
> However, if I rework the query as follows:
>
> SELECT ASSET.ASSETID
> FROM ASSET
> WHERE ASSET.ASSETID in
> (select 36 from rdb$database);
>
> The query is MUCH slower, and the resulting plan is as follows:
> PLAN (ASSET NATURAL)
> PLAN (RDB$DATABASE NATURAL)
>
> So, it appears that the query is is doing a table scan for ASSETID
=
> 36 instead of using the PK index.
>
> Note that the query is purposely simplified to show the issue I'm
> having. In practice, the query would be something to this affect:
>
> SELECT ASSET.ASSETID, ASSET.NAME, ASSET.FOLDERNAME
> FROM ASSET
> WHERE ASSET.ASSETID
> IN (SELECT DISTINCT JOINCATEGORYASSET.ASSETID
> FROM JOINCATEGORYASSET
> WHERE JOINCATEGORYASSET.CATEGORYID IN (14,15));
>
> So, in this case, the query is exceedingly slow since the subquery
> returns 2000 items, and it appears that there's a table scan of the
> asset table happening for each item (the query takes about 20
> seconds to return).
>
> I could remove the subquery altogether and just join everything up
> as well:
>
> SELECT distinct ASSET.ASSETID, ASSET.NAME, ASSET.FOLDERNAME
> FROM ASSET, JOINCATEGORYASSET
> WHERE ASSET.ASSETID = JOINCATEGORYASSET.ASSETID,
> and JOINCATEGORYASSET.CATEGORYID IN (14,15);
>
> ...but the DISTINCT portion of the query would make this a little
> questionable...
>
> Is there a better way of dealing with this type of query?
>
> Thx!
>
> Greg