Subject Re: Query Optimization (Arno, where are you?)
Author Svein Erling Tysvær
Hi Juan!

> All relations are done in FOREIGN KEYS, I tried some INNER JOINS
> where I thought it could help, and some RIGHT JOINS where it could
> help too, but It didn't at all.

FROM A LEFT JOIN B is the same as FROM B RIGHT JOIN A (both very
different from FROM A JOIN B, which in general is more efficient).

> The 237 records that I need are :
> ********MINIMUM SQL***********
> SELECT B.TB_INSPECT_ID
> FROM TB_INSPECT_GROUP A LEFT JOIN TB_INSPECT B
> ON B.INSPECTION_GROUP_ID = A.TB_INSPECT_GROUP_ID
> WHERE
> A.DEST_LOT_ID IS NULL
> AND A.SO_DET_ID IS NOT NULL
> AND B.WRITE_UP_DET_ID IS NULL
> AND B.STATUS IN ('PASSED','USE AS IS')
> ********END MINIMUM SQL*********
> Here table A is visited 69,197 times
> and table B is visited 205,865 times
> and 237 records are returned in about two seconds.
>
> ******MODIFIED VERSION OF MINIMUM SQL************
> SELECT B.TB_INSPECT_ID
> FROM TB_INSPECT_GROUP A LEFT JOIN TB_INSPECT B
> ON (B.INSPECTION_GROUP_ID = A.TB_INSPECT_GROUP_ID) and
> (A.DEST_LOT_ID IS NULL AND A.SO_DET_ID IS NOT NULL) <<--
> WHERE
> A.DEST_LOT_ID IS NULL
> AND A.SO_DET_ID IS NOT NULL
> AND B.WRITE_UP_DET_ID IS NULL
> AND B.STATUS IN ('PASSED','USE AS IS')
> ******MODIFIED VERSION OF MINIMUM SQL************
> Here table A is visited 67,159 times
> and table B is visited 47,048 times
> and the same 237 records are returnes in only one second!

Since A is on the left side of the join, I think these two statements
should be identical. However, outer joins is something I use very
rarely (in fact, whenever I do so, I only limit the right side of the
left join (B) within the join itself and never in the where clause,
but I think that could give you a different result), and know far too
little about how the optimizer behaves with them and why.

> I suppose that because the WHERE CLAUSE only applies to the first
> two tables, the other five tables (D,E,F,G & H) should only be
> visited 237 times. Am I wrong? why?

Logically, I'd agree with you, but the optimizer is much more
optimized than my brain. From an earlier mail, you report your plan to
be:

PLAN JOIN (JOIN (JOIN (JOIN (JOIN (JOIN (A INDEX (RDB$FOREIGN132),B
INDEX (RDB$FOREIGN112)),D INDEX (RDB$PRIMARY53)),E INDEX
(RDB$PRIMARY38)),F INDEX (RDB$PRIMARY57)),G INDEX (TB_SO_IDX)),H
INDEX (RDB$PRIMARY52))

I read this as the resultset of A and B will be joined with D and that
result will in turn be joined with E and so on (even though it is done
by some kind of calculation and not by getting the result set itself).
Though my guess is that for each level things become more complicated
and the joins more expensive.

> I think that the problem is that the WHERE CLAUSE is not being
> applied BEFORE THE JOIN (Are ther some technical limitations?). To
> prove this, I've got an improve by putting part of the CONDITION
> (A.DEST_LOT_ID IS NULL AND A.SO_DET_ID IS NOT NULL --Line 16) on the
> ON clause of the JOIN. See new version at the end!
>
> ********** NEW QUERY VERSION skipped *******
>
> I think, that this is a problem of the SQL optimizer for the
> development group. Should I post it there?

Maybe, but let's see if Arno gets attracted to the revised title and
has some input. It will be interesting to see if he can tell the
difference between your two apparently identical statements.

Sorry for not being able to help any more,
Set