Subject RE: [ib-support] Summarization Query Speed Issue
Author Svein Erling Tysvaer
Hi Sugi, I've got a few more comments:

At 00:39 05.03.2003 +0700, you wrote:
> > In general, client/server databases appears to be slow when
> > you select entire tables rather than specific records.
> > Always try to use as limiting a where clause as possible
> > and never return any redundant records (that is
> > the big jump from desktop databases).
>Thanks for the input. I agree that for general use, 'select *' is
>inefficient, But sometimes for reporting or data warehouse queries,
>'select *' is the only way to go.

Oh, I wasn't thinking about whether you use select * or select <column
name>. This bit was advising to add WHERE <some criteria> to reduce the
number of rows returned.

> > If yes how does this perform :
> > SELECT
> > p.productID,
> > (SELECT SUM(o.qty) FROM OrderDetail o
> > WHERE o.productid = p.productid)
> > FROM Product p
> > ORDER BY p.productID
>PLAN (O INDEX (FK_ORDERDET_REFERENCE_ITEMS)) PLAN (P NATURAL)
> From my limited testing, the performance is quite good, around 2
>seconds. Now this baffles me since I was told once that 'correlated
>subquery' could not be faster than a straightforward join...

Whomever said this probably meant [INNER] JOIN, not LEFT (or RIGHT) [OUTER]
JOIN.

> > select p.productID, sum(o.qty)
> > from product p
> > left join orderdetail o on (p.productID = o.productID)
> > join ordermaster m on (o.ordermasterid = m.ordermasterid and
> > m.transdate between '2003-01-01' and '2003-02-01')
> > group by p.productID
>PLAN SORT (JOIN (JOIN (P NATURAL,
> O INDEX (FK_ORDERDET_REFERENCE_ITEMS)),
> M INDEX (PK_ORDERMASTER)))
>Execution time is around 50 seconds. (As opposed to ~60 and ~51 seconds
>for the previous version). An improvement, but not much.

I didn't suggest this because of the speed, but because your original query
may or may not give correct results. Your original query asked for all
records in the product table and then left joined ordermaster. The where
clause is used for restricting returned rows and by putting something from
ordermaster there, you are at the same time telling Fb to return all
records of product and reduce it to those who link to a record with
ordermaster.transdate between two dates. Basically, this is telling Fb to
only give you those rows of ordermaster having a transdate between your
dates, but at the same time let Fb itself choose whether it wants to return
all records of product or only those who link to an ordermaster with the
correct dates.

>Based on mr Arno Brinkman's suggestion, here's the 'correlated subquery'
>version that I came up with:
>...
>SELECT
> p.productID,
> (SELECT SUM(o.qty)
> FROM OrderDetail o join ordermaster m
> on (o.ordermasterid = m.ordermasterid)
> and (m.orderdate between '2003-01-01' and '2003-01-02')
> WHERE o.productid = p.productid)
>FROM
> Product p
>...
>PLAN JOIN (O INDEX (FK_ORDERDET_REFERENCE_ITEMS),M INDEX
>(PK_ORDERMASTER)) PLAN (P NATURAL)
>Execution time is almost instant (around 1 seconds), the fetching is a
>different story altogether...:)

In this case you are only doing an inner join between OrderDetail and
OrderMaster and your WHERE clause is for the subselect. Here, I don't see
the point in having the m.orderdate within the join - I would have put it
in the where clause. Though only for clarity, it shouldn't make any speed
difference and your query as it stands should produce correct results.

Set