Subject Summarization Query Speed Issue
Author sugi
Dear all,

Here's another query performance question...:)
The schema is simple: CUSTOMER, PRODUCT, SALESMAN, ORDERMASTER,
ORDERDETAIL, with the usual relationships:

//
CREATE TABLE CUSTOMER (
CUSTOMERID TINTEGER NOT NULL,
NAME TSHORTSTRING NOT NULL,
ADDRESS TLONGSTRING NOT NULL );
ALTER TABLE CUSTOMER ADD CONSTRAINT PK_CUSTOMER PRIMARY KEY
(CUSTOMERID);
//
CREATE TABLE PRODUCT (
PRODUCTID TINTEGER NOT NULL,
NAME TSHORTSTRING NOT NULL,
DESCRIPTION TLONGSTRING);
ALTER TABLE PRODUCT ADD CONSTRAINT PK_ITEMS PRIMARY KEY (PRODUCTID);
//
CREATE TABLE SALESMAN (
SALESMANID TINTEGER NOT NULL,
NAME TSHORTSTRING NOT NULL );
ALTER TABLE SALESMAN ADD CONSTRAINT PK_SALESMAN PRIMARY KEY
(SALESMANID);
//
CREATE TABLE ORDERMASTER (
ORDERMASTERID TINTEGER NOT NULL,
CUSTOMERID TINTEGER NOT NULL,
SALESMANID TINTEGER,
DESCRIPTION TLONGSTRING NOT NULL,
ORDERDATE TTIMESTAMP NOT NULL);
ALTER TABLE ORDERMASTER ADD CONSTRAINT PK_ORDERMASTER PRIMARY KEY
(ORDERMASTERID);
ALTER TABLE ORDERMASTER ADD CONSTRAINT FK_ORDERMAS_REFERENCE_CUSTOMER
FOREIGN KEY (CUSTOMERID)
REFERENCES CUSTOMER (CUSTOMERID);
ALTER TABLE ORDERMASTER ADD CONSTRAINT FK_ORDERMAS_REFERENCE_SALESMAN
FOREIGN KEY (SALESMANID)
REFERENCES SALESMAN (SALESMANID);
//
CREATE TABLE ORDERDETAIL (
ORDERDETAILID TINTEGER NOT NULL,
ORDERMASTERID TINTEGER NOT NULL,
PRODUCTID TINTEGER NOT NULL,
QTY TINTEGER NOT NULL,
UNITPRICE TCURRENCY NOT NULL);
ALTER TABLE ORDERDETAIL ADD CONSTRAINT PK_ORDERDETAIL PRIMARY KEY
(ORDERDETAILID);
ALTER TABLE ORDERDETAIL ADD CONSTRAINT FK_ORDERDET_REFERENCE_ITEMS
FOREIGN KEY (PRODUCTID)
REFERENCES PRODUCT (PRODUCTID);
ALTER TABLE ORDERDETAIL ADD CONSTRAINT FK_ORDERDET_REFERENCE_ORDERMAS
FOREIGN KEY (ORDERMASTERID)
REFERENCES ORDERMASTER (ORDERMASTERID);
//

NOTES :
1. The only 'uncommon' thing I did with the schema above is that
ORDERDETAIL's primary key is ORDERDETAILID only, while normally we would
put both ORDERMASTERID and ORDERDETAILID (or LINEITEMNO) as primary key.

2. No additional index were made aside from the PK and FK generated
ones.
3. All index selectivity are recomputed prior to running the queries.
4. Database is on Firebird 1.0.796, Windows Service, Page size is 4096,
Dialect3.
5. running on Athlon XP1600, 512MB.
6. All query tests were done from inside IBExpert Personal Edition
v.1.9.5.23.

The tables were populated with computer generated records, for
benchmarking purposes. (We have the scripts to create and populate the
records available, just in case anyone is interested).
- CUSTOMER : 5,000 records
- PRODUCT : 20,000 records
- SALESMAN : 1,000 records
- ORDERMASTER : 180,000 records
- ORDERDETAIL : approx. 546,000 records

--------------------------------------------------------
CASE 1 : Find Total Sales for each Product.
select p.productID, sum(o.qty)
from product p
join orderdetail o on (p.productid = o.productid)
group by p.productID
RESULT:
- PLAN JOIN (P ORDER RDB$PRIMARY2,O INDEX (FK_ORDERDET_REFERENCE_ITEMS))
- First run, approx. 9 seconds.
- Subsequent runs, approx. 1.5 to 2 seconds (on average).
QUESTIONS:
1. Why is the big difference between runs?
2. I suppose this means that users will have to wait at least 10 seconds
for opening such queries. Is this normal?
3. Anything else that I could do to speed this up?
--------------------------------------------------------
CASE 2 : Same as CASE 1, Change JOIN to LEFT JOIN.
select p.productID, sum(o.qty)
from product p
LEFT JOIN orderdetail o on (p.productid = o.productid)
group by p.productID
RESULT:
- PLAN SORT (JOIN (P NATURAL,O INDEX (FK_ORDERDET_REFERENCE_ITEMS)))
- First run: approx 32 seconds
- Subsequent runs: approx 20 seconds.
QUESTIONS:
1. I read somewhere in this list that any occurence of 'NATURAL' in a
plan means that the query is not optimized. Is it the case here?
2. The only difference with CASE 1 is the left join, but the performance
drops is unacceptable in this case. Is there anything I can do to speed
this up?
--------------------------------------------------------
CASE 3 : Detour from CASE2
Out of curiosity, (and a bit disheartened by the results of CASE 2) , I
tried to get the barebone data itself, without any joins.
select o.productID, sum(o.qty)
from orderDetail o
group by o.productID
RESULT:
- PLAN (O ORDER RDB$FOREIGN7)
- Result is INSTANTANEOUS. (less than one second)
QUESTIONS:
1. I don't think the resultset from this and the previous one is that
much different. The only thing left is JOINING the resultset from this
query to the PRODUCT TABLE. Seeing that this query is instantaneous, I
must be doing something wrong in the above queries.
2. One possible way to interpret this result is that we should do the
hardwork of JOINS, etc, ourselves instead of relying on Firebird's
optimizer. This is a bad news for me since I liked to have a single
understandable/readable SQL statement instead of tons of little queries
to do the joins manually. Besides, I would hate to have to implement it
just to find that it will perform just as bad...:(
3. Haven't checked it yet, but could QUERY#2 be optimized by writing it
into a storedproc? (I'm thinking of the dbase style two-level-loops
here.). Sigh. And I thought SQL would help us out of the dark ages...:)
--------------------------------------------------------
CASE 4 : Find Total Sales For Each Product for certain months.
select p.productID, sum(o.qty)
from product p
left join orderdetail o on (p.productID = o.productID)
left join ordermaster m on (o.ordermasterid = m.ordermasterid)
where (m.transdate between '2003-01-01' and '2003-02-01')
group by p.productID
RESULT:
- PLAN SORT (JOIN (JOIN (P NATURAL,O INDEX
(FK_ORDERDET_REFERENCE_ITEMS)),M INDEX (PK_ORDERMASTER)))
- First Run : approx 60 seconds.
- Second run : No change, around 60 seconds.
--------------------------------------------------------
I didn't really expect the last one to perform decently, so I'm totally
at loss on how to proceed from here.

Some of the recurring nagging questions are as follows:
1. How do we decide what is 'SLOW' or what is 'NORMAL SPEED'? My current
personal limit is around 10 seconds, but I suspect this could/would
change with different hardware. (But my hardware can be considered
'midrange' right now, so...)
2. LEFT JOINs seems to be giving me no end of troubles, but in real
world (especially report-type queries) this is crucial. Any advice?
3. In the above examples, the main theme is how to find the total for a
certail table. Suppose I write a storedproc to implement this. Which one
will give better performance :
- A select procedure with one parameter (PRODUCTID), returning TOTAL for
that particular product only. If I need to know ALL product's total, I
would join the PRODUCT table with this SP.
- A select procedure with no parameter, returning ALL product's total.


Any comments/pointers are very much appreciated.
Sorry for the long-winded email. Just wanted to show you how frustrated
I am...:)

Thank you very much in advance,
sugiharto lim.