Subject Re: [firebird-support] Query with indexes used takes over 21 minutes
Author Ann W. Harrison
Carsten Schäfer wrote:
> Is this normal behaviour of Firebird ?
> I can't believe that someone can use Firebird for reports
> where the user can make his own query, because it's so
> easy to break firebird.

The short answer is that you did this to yourself by trying to outsmart
the optimizer. The long answer follows.



> SELECT count(t_apos.id_apos)
> FROM t_apos
> JOIN t_auftrag ON f_id_auftrag + 0 = id_auftrag
> WHERE t_apos.f_id_anliefer = 10
> AND t_auftrag.f_adatum between '07.12.2004 00:00' AND '08.01.2005 11:23'
> AND t_apos.f_id_auftrag + 0 != 0

OK, I suspect that the "+ 0" terms are added to try to force specific
index use. Why? The second one - "AND t_apos.f_id_auftrag + 0 != 0"
isn't doing anything but cluttering up the query, because Firebird never
uses an index for "!=".

And, if you're rewriting the query to force index use, why not add
consistent aliasing, just as insurance against inadvertent ambiguity?

If I were looking at the query, ignoring the +0 terms, I'd guess that a
reasonable plan would select the t_auftrag records between the date
ranges and then lookup the matching t_apos records using the foreign
key. Is this query equivalent, and how does it perform?

SELECT count(x.t_apos.id_apos)
FROM t_apos x
JOIN t_auftrag y ON x.f_id_auftrag = y.id_auftrag
WHERE x.f_id_anliefer = 10
AND y.f_adatum between '07.12.2004 00:00' AND '08.01.2005 11:23'
AND x.f_id_auftrag != 0

> RDB$FOREIGN62 = foreign key on t_apos.f_id_anliefer
> PlanAnalyzer says that Firebird need 82.194.000 indexed reads on t_apos
> and 2.280 indexed reads on t_auftrag
> Result of count is 769, t_apos has 221.063 rows and t_auftrag has 97.751 rows.

The plan chosen selects the t_auftrag records between the two dates -
one would assume 2.280 of them - and for each one finds the 36.050
t_apos records with f_id_anliefer = 10 then finds the matches between
the two groups. (2.280 * 36.050 = 82.194.000) That's pretty horrible.
but the optimizer is looking for the lowest cost first cut and 2.280 is
less than 36.050, so given two independent indexes and no other bridge,
it's not such a bad choice.

A full sort merge would be better, but that's a hard choice - everybody
knows that sort merge (natural, natural) is horribly expensive.
Everybody's wrong. The nested loop does 82.196.280 record fetches while
the sort would do only 318.814 - but what's a factor of 258:1 in fetches
(some of them physical disk reads) compared to a scary (in memory) sort?

What I mean by "no other bridge" is that the expression
"f_id_auftrag + 0 = id_auftrag" is discarded as an indexable connection
between the two tables because of the artificial computation. You've
eliminated the possiblity of using either the foreign key on t_apos or
the primary key on t_auftrag.

Guessing from what you've sent, the plan you hope to see starts by
finding the t_apos records with f_id_anliefer = 10 and then uses the
primary key index on t_auftrag to find the match. If the matched row is
in the right date range and the primary key isn't 0, then you've got a
record to count. My guess on the cost of that is 72.100 - finding the
initial 36.050 t_apos records and the single matching t_aufrag record.

The plan I suggested above (find the t_auftrag first, then lookup the
t_apos) would start by finding 2.280 t_auftrags in the date range, then
for each of them lookup t_apos on the foreign key. I have no idea of
the distribution of the t_apos records on that key, but if the values
were evenly divided there would be two or three t_apos's for each
t_auftrag. If so, the total cost would be about 8.000 record fetches.

That sounds like a plan to me.