Subject Re: How to handle a large 'IN' clause?
Author Adam
> re-query is significantly faster, which is good, however putting all
> the selected PK's into the temp table adds a considerable overhead!!
> So instead of using the temp table, she then used multiple IN
> "...IN (1, 2, 3, ...1000) OR IN (1001, 1002, 1003, ...2000) OR
> ... OR IN (4001, 4002, 4003, ...5000)"
> She found that although the length of the query is horrendous,
> one query /does/ work and overall it is significantly faster than
> putting the selected rows into a temp table and running the query
> the join to the temp table.
> It seems creating/inserting the data into the temp table is what
> it (and yes, a prepared statement was used).
> Should this be so??

We are planning to do something similar for a similar problem (where
the number of records is based on user selection and can be above
1500). Instead of using the userID, we had planned on using
CURRENT_TRANSACTION because it is guaranteed unique.

I think the thing that will kill the performance is the fact that
even if there are only 10 records matching the criteria, you will
still have to insert the 5000 records into the temp table.

What sort of performance difference are we looking at here, and is it
exponential or linear to the number of selected records.

I can't shed any light from experience, but insert performance is
precisely what concerns us about the approach. On the other hand, the
inserts could happen as they select things, but I imagine this could
get trickier if they then unselect.