Subject RE: [firebird-support] Large record count - will it work?
Author Svein Erling Tysvær
In theory, all simple cases are easily solved ;o)

select * from a where a.a in (select b.a from b)

is of course simple to do the way you suggest (and it would probably be quicker in some cases, but I think the optimizer already translates this kind of queries into EXISTS rather than IN, so I doubt it would be quicker in all cases). The slightly more complex

select * from a where a.a in (select b.a from b where a.b < b.b)

could possibly be solved by retrieving the combination b.a and b.b once as an intermediate step. But when shall such 'optimization' be done? Doing this for the query

select * from a where a.pk = 123456 and a.a in (select b.a from b where a.b < b.b)

would clearly be inefficient if b contained one million records (and whilst the optimizer knows the selectivity of indexes, it does not contain histograms, so it cannot know anything for not-too-selective columns compared against an unusual value).

I'm happy that Firebird leaves this kind of decisions to me as a user, sometimes it is a bit annoying to have to use EXISTS on large tables (on Firebird 1.5), but with the relatively new WITH statement, this can often be circumvented, e.g.

WITH tempb(a, bmax) AS
(SELECT b.a, max(b.b) FROM b GROUP BY 1)
SELECT a.*
FROM a
JOIN tempb ON a.a = tempb.a
WHERE a.b < tempb.bmax

could be a quick way to execute the above query if 'b' contained 1000 records and 'a' 1 million records, but I'd be surprised if the optimizer was sophisticated enough to figure out this for me. And we're still only talking about simple cases...

Set

-----Original Message-----
From: firebird-support@yahoogroups.com [mailto:firebird-support@yahoogroups.com] On Behalf Of Kjell Rilbe
Sent: 18. februar 2009 09:18
To: firebird-support@yahoogroups.com
Subject: Re: [firebird-support] Large record count - will it work?

Thomas Steinmaurer wrote:

> > Aage Johansen wrote:
> >> If ECO generates Firebird-unfriendly SQL you may find yourself in a
> >> position where you cannot really fine tune the queries. This may be
> >> the case for any db system, and I don't know ECO at all. Just a
> >> faint recollection of someone having problems with generated SQL
> >> which were not optimized well (don't remember which db server).
> >
> > If that happens I hope the ECO people will try to generate better SQL or
> > at least help me inject special SQL where needed.
>
> This is in fact a general problem with OR mappers (although ECO isn't an
> OR mapper only) that the generated SQL isn't always optimal. ;-)

I believe ECO lets you use whatever custom SQL you want to retrieve
data, but as far as I know, you cannot change the SQL that IS
autogenerated. But it *might* be possible. ECO is very powerful in many
ways and I think this is one of them...

Anyway, regarding the problem I've most often stumbled upon: correlated
subqueries, I'd think that FB could be improved to perform better. If I
understand correctly the current approach is to execute the subquery
once for each "master" record. Obviously very inefficient if the
subquery can't use an index.

But in theory, FB could start by building a list of "link values" and
then execute the subquery once, retrieving all matching records at once,
and then merge into the "master" result set.

In theory, but can it be done "easily" in the FB engine?

Kjell
--
--------------------------------------
Kjell Rilbe
DataDIA AB
E-post: kjell@...
Telefon: 08-761 06 55
Mobil: 0733-44 24 64