Subject RE: [ib-support] Summarization Query Speed Issue
Author sugi
> I see your frustration. When first opening a query, there's the time
> for the prepare, unless you explicitly prepared the query, also
> there's the loading of the data pages into the server when the query
> is first executed. When running the same query immediately after the
> first one, the data is already retrieved from the hard-drive. But, I
> notice in one instance, it doesn't help in the execution time.
Yes, that would be when I have a fairly complex query (multiple left
joins). I'm refraining myself from second-guessing the firebird
optimizer, but I am a bit uneasy with the query itself (especially the
WHERE clause).

===
CASE 4 : Find Total Sales For Each Product for certain months.
select p.productID, sum(o.qty)
from product p
left join orderdetail o on (p.productID = o.productID)
left join ordermaster m on (o.ordermasterid = m.ordermasterid)
where (m.transdate between '2003-01-01' and '2003-02-01')
group by p.productID
RESULT:
- PLAN SORT (JOIN (JOIN (P NATURAL,O INDEX
(FK_ORDERDET_REFERENCE_ITEMS)),M INDEX (PK_ORDERMASTER)))
===

The problem seems to come from the left join. Please refer to the
original message (Case 1 and Case 2) where a two-table join query using
inner JOIN was changed to LEFT JOIN, and the performance drops from 9
sec (first run) and ~2 sec (subsequent runs) to 32 sec (first run) and
~20 sec (subsequent runs) respectively. This is a very big difference,
more than three times slower, so naturally I'm wondering whether I'm
doing something wrong.

> What version of Interbase or Firebird are you using?
The production database is on Firebird 1.0.0.796 on Win32 (Win2K).

> Have you tried Firebird 1.5 beta 3? You'll probably notice a big
> difference in the performance.
Just downloaded the beta. I'll try to summarize my results with this
version below,

CASE 1:
select p.productID, sum(o.qty)
from product p
join orderdetail o on (p.productid = o.productid)
group by p.productID

Result:
PLAN JOIN (P ORDER RDB$PRIMARY2,O INDEX (FK_ORDERDET_REFERENCE_ITEMS))
First run : 4 seconds (1.0.796 = 9 seconds)
Subsequent runs : ~1.5 seconds.(1.0.796 = ~2 seconds)

CASE 2: (changed the above query from JOIN to LEFT JOIN)
Result:
PLAN SORT (JOIN (P NATURAL,O INDEX (FK_ORDERDET_REFERENCE_ITEMS)))
First run : 20 seconds (1.0.796 = 32 seconds)
Subsequent runs : ~17 seconds. (1.0.796 = ~20 seconds)

CASE 3: (Already fast enough in 1.0.796)
select o.productID, sum(o.qty)
from orderDetail o
group by o.productID
Result: no change.

CASE 4: (Quoted above, three table left joins with a where clause on
date column)
Result:
PLAN SORT (JOIN (JOIN (P NATURAL,O INDEX (FK_ORDERDET_REFERENCE_ITEMS)),
M INDEX (PK_ORDERMASTER)))
First run : ~60 seconds (1.0.796 = ~60 seconds)
Subsequent Runs : ~51 seconds (1.0.796 = ~60 seconds)

> Some are reporting differences between
> 20-2000%
In my tests of FB 1.5 beta 3, the queries are generally faster, but some
queries are still very slow, please see the fourth case above (around 60
seconds). All queries involving LEFT joins is much much slower than
their inner join version; is there any way to speed this up?

I can only think if two alternatives right now: view or storedproc.
My testing with views gave disastrous results... (more than 5 minutes,
end-tasked that one).

The SP approach is shown below, should give similar results as Query#2
(LEFT JOINS):
...
CREATE PROCEDURE SPSALES
RETURNS (
PRODUCTID INTEGER,
QTY NUMERIC(18,4))
AS
begin
/* Procedure Text */

for select p.productid from product p
into :productID do begin

select sum(o.qty) from orderDetail o
where (o.productid = :productID)
into qty;

if (qty is null) then qty = 0;
suspend;
end
end
...

The PLAN generated was :
PLAN (O INDEX (FK_ORDERDET_REFERENCE_ITEMS))(P NATURAL)
And 'select * from spSales' was very fast (around 3 seconds) compared to
the left join approach. Is there anyway I can tune the query to match
the StoredProc's speed?

> depending on the query and indices that exist in the table.
Can you please advise me on what extra indices that might help in my
particular case?

My other question was about how can I decide whether a particular query
is 'slow' or 'fast'... in which case I can stop worrying about the
performance and just accept it as FB's given constant, will not get any
faster no matter what I do.

Many thanks,
sugi.