Subject Re: [Firebird-Architect] IN LIST
Author = m. Th =
Adriano dos Santos Fernandes wrote:
> I said... you that never understand things and go with this phrase...
>
> It's dependent from the user...
>
> Right now, I'm writing a IMAP server.
>
> The user is working in his email client, then he decides to select some
> messages.
> He selected all messages from 2006-01-01 to 2006-01-31 and select
> another message from 2006-07-01 and mark as read.
>
> The email client send to the server a list of message ids:
> 10000:11000,13000
>
> 10000 is id of message from 2006-01-01
> 11000 is id of message from 2006-01-31
> 13000 id id of message from 2006-07-01
>
> Then now I know you will say that I'm trying to put in the database
> something that belongs to a IMAP server...
>
> But just think... This is a case use of well written applications that
> allows the user to filter things based on his need.
>
>

:))) I imagined... A user multi-selection. :)

For the beginning, let me stress that it depends of how you designed
your application, but this can be easily and fast worked around using an
IN clause in which you build the comma-delimited members list. It will
work (with a small trick) very fast for _any_ number of IN members till
you hit the 64k query limit (ie. somewhere around of 11800 members). On
my test system (Athlon 64 3GHz, 1GB RAM, SATA HDD, WinXP Fb 2 SS), on
real-life databases the query with ~11800 items opened in ~ 2 secs from
which ~ 1.6 secs was the prepare time.

So, based on my small experience on this area, these are the following
things that matter:

1. The IN predicate can be made very easily, fast and safe to accept an
'unlimited' number of members. Let me stress that /easily, fast, and
safe/ makes this possibility attractive. How? By pre-parsing the IN list
before sending it to the actual parser. (Now I feel that I didn't
explain the things how it should... Dmitry can you help me?) In fact I
do this on the client side:

function ParseInMembers(aField, aInMembers: string; aNotIn: boolean=False): string;
//const
// MAX_MEMBERS = 1499; - commented for testing purposes

var
cRelBetween: string;
cCrt: char;
nCommas: integer;

begin
if aNotIn then
begin
Result:=' '+aField+' NOT IN (';
cRelBetween:=') AND '+aField+' NOT IN (';
end
else
begin
Result:=' '+aField+' IN(';
cRelBetween:=') OR '+aField+' IN (';;
end;

nCommas:=0;
for cCrt in aInMembers do
begin
if cCrt=',' then
begin
inc(nCommas);
if nCommas=MAX_MEMBERS then
begin
Result:=Result+cRelBetween;
nCommas:=0;
end
else
Result:=Result+cCrt;
end
else
Result:=Result+cCrt;
end;
Result:=Result+')';
end;

While this (IMHO) is an useful thing to have it on the engine (due to
parsing mainly, a (possible) query parameter support, eating a variable
amount from over-the-wire 64k, not much though I admit), the concrete
situations which require it (which are more and more present today) IMHO
doesn't require the implementation of this feature if a significant
coding effort (or testing) is required. But here is crucial (IMHO) what
the other users say...

2. As you see above, the ranges, are really required from my POV only
when you hit 64k query limit. As an aside, it seems that we remained the
last ones AFAIK together with Informix (which anyway isn't so much
pushed up by IBM) with this limit (since DB2 V8 raised the bar at
"2,097,152 bytes" - see
http://www.informit.com/content/images/0672326132/downloads/appd.pdf -
do a find for 'Largest SQL statement'). The next smallest one is AFAIK
MySQL with 1024k. But if you try to send more than 11000 items, perhaps
you'll have another problems. For ex. from where do you have these keys?
:) Are you really need to select more than 11500 items/select?

3. The ranges is another 'tricky' feature on user selected datasets.
Let's say that you have a DBGrid which have as dataset something like
'Select * from emails where user='Adriano' and deleted=0'; Even the
table will have more recs the user will see only those recs that he
should see (that's ok). If the user sees the emails 1,2,3, 5, 9,10, 13
and selects the records (using the [shift] ie. a range) 3 till 10 and
after this (using [ctrl]) selects also 1 and 13 you must build a query
like 'select * from emails where user='Adriano' and deleted=0 and id in
list(3..10, 1, 13)' which can be (much) slower and complicated to build
than 'select * from emails where id in (1,3,5,9,10,13)' (which usually
uses the PK index only). Especially if your 'original' query is slow...
In fact this is a classical [R-]OLAP refining... but, again, here it
depends on what do you want to achieve in your app and its internals. I
just wanted to point that ranges can be slippery.

4. Parametrized. I think that the main feature of the LIST is the
possibility to make it to accept parameters AFAIS. If the IN (or at
least a variant of it...) can be made to accept parameters then we'll
have a great speed gain (as I wrote above for 11800 items from 2 secs
execution time, 1.6 secs was prepare time, ie more that 75%!). But if
it's parametrized then the trick with ID IN (<1499 members>) OR ID IN
(<1499 members) [OR ID IN (<1499 members)...] must be put somewhere on
server side - on EVL layer perhaps? (you know much better than me :) )

hth,

m. th.