Subject Between vs IN
Author Adam
Hi Group,

This is partly a follow up to this post, and partly a new question.

http://groups.yahoo.com/group/firebird-support/message/65471

If you have read the long blurb on the linked post, you can jump to
the next paragraph. Otherwise, this is what you need to know. We had a
function in Delphi that converts a dataset to a comma separated list.
We modified it so it split the list into two if it contained 1500
records, which buys us significant scalability.

We have analysed the problem a bit further, and in our environment
anyway, there are likely to be a significant number of sequential
numbers. One idea we were floating was to switch between IN blocks and
between blocks depending on which is smaller.

For example,

ID in (1,2,3,....50,55,56,65,66,67,.....,500, etc)

could be represented as

(ID between 1 and 50) or (ID between 55 and 56) or (ID between 65 and
500) or ....

which will usually be significantly smaller. You could also represent
this as

(ID between 1 and 50) or (ID in (55,56)) or (ID between 65 and 500) or
....

Now obviously if the IDs are scattered, then the IN is the sensible
way to approach it, and if they are sequential and there are more than
2 or 3, then between will be more efficient. I was thinking that I
could create both the between blocks and the equivalent IN block then
choose the best approach for each subset of sequential numbers,
keeping in mind that the IN will do an indexed search for each element
and the between only does a single indexed search (well per between
anyway). In theory, this means that the between should execute
slightly faster than IN where you have more than one record.

We have also toyed with the idea of creating a table with the
current_transaction as part of the key, filling it with records then
joining to it to effectively create the IN clause. I think I heard
someone say the approach worked but the insert performance then became
an issue, I imagine about 1000 records per second for what an IN
clause does quite quickly.

I am really after the approach used by folk in this list to manage
which records are returned, where the interface allows for large
subsets (or even automatically generates the subsets). Obviously a
normal human isn't going to individually tick 1500 boxes, but when it
is automated it is possible and will in fact happen if the customer is
large enough.

Adam