Subject Re: How to handle a large 'IN' clause?
Author phil_hhn
--- In firebird-support@yahoogroups.com, Pavel Menshchikov
<mpn2001@y...> wrote:
> Hello Phil,
>
>> ... but even if the data is in a temporary table, would that not
still
>> give me a problem when I select over 1500 records in the 'in'
clause?
>> When the user selects some records this is completely arbitrary and
>> there is no database relationship that can cater for this.
>>
>> Unless you meant that I should put all the selected records into a
>> temp table (Hmmm, I wonder how quick that'd be...) and select all
PKs
>> from that as the argument in the original querys 'in' clause...?
> Kjell mean that you should create temporary table with the user ID
and
> PK of your original table (for example). Then you fill the temporary
> table with the data that the user (un)specified in your original
> table, and perform joining or use construction like
> ... IN (SELECT OrigPK FROM TempTable WHERE UserID = :UID)
> AFAIK, joins are more effective that the constuction above
(IN(...)).
>
> Best regards,
> Pavel Menshchikov

I wanted to post an update here because one of my colleagues got
around to addressing this recently, did some experimenting and the
results aren't what we expected.

She decided to create a temp table and put the data in that, then she
could take the approach:
"...IN (SELECT OrigPK FROM TempTable WHERE UserID = :UID)..."
as Pavel suggested. Therefore when a user selects the relevant rows
(eg 5000 of them), she puts the 5000 PK's into a temp table, and then
re-querys the database, joining against this temp table. She found the
re-query is significantly faster, which is good, however putting all
the selected PK's into the temp table adds a considerable overhead!!

So instead of using the temp table, she then used multiple IN clauses:
"...IN (1, 2, 3, ...1000) OR IN (1001, 1002, 1003, ...2000) OR IN
... OR IN (4001, 4002, 4003, ...5000)"

She found that although the length of the query is horrendous, having
one query /does/ work and overall it is significantly faster than
putting the selected rows into a temp table and running the query with
the join to the temp table.


It seems creating/inserting the data into the temp table is what kills
it (and yes, a prepared statement was used).

Should this be so??

Cheers