Subject RE: Join doesn't use index?
Author Leyne, Sean
Jacob,

> Hi. I'm trying to optimize a query that joins a master table with additional
> information from 4 other tables, and sorting on a date field.
>
> Here is my query
>
> SELECT
> T.TransNo, T.TransDate, T.VoucherNo,
> T.PayerNo, P.PayerNumber,
> T.PaymentNo, Pm.PaymentName,
> T.SupplementText,
> T.PayTypeNo, Pt.PayTypeName,
> T.DebitAmount, T.CreditAmount,
> T.StudentNo, S.StudentNumber
> FROM Trans T
> LEFT OUTER JOIN Student S ON T.StudentNo = S.StudentNo JOIN Payer P ON
> T.PayerNo = P.PayerNo JOIN Payment Pm ON T.PaymentNo =
> Pm.PaymentNo JOIN PayType Pt ON T.PayTypeNo = Pt.PayTypeNo ORDER
> BY TransDate
>
> I've got a descending index on TRANSDATE (DDL below) but not an ascending
> so I was wondering why doing
>
> ORDER BY TransDate ASC
>
> Wasn't faster than doing
>
> ORDER BY TransDate DESC
>
> So I looked at the automatically generated plan (using FlameRobin), realizing
> that the optimizer didn't use the index on TRANSDATE at all:
>
> PLAN SORT (JOIN (JOIN (T NATURAL, S INDEX (STUDENT_PK)), P INDEX
> (PAYER_PK), PM INDEX (PAYMENT_PK), PT INDEX (PAYTYPE_PK)))

Your query doesn't have any where clause, to narrow the scope of the rows to be returned. Further, you are sort a large table, as such the engine has determined that walking the master table (performing NATURAL scan) then sorting the results is a faster access method than using any TransDate index.

In the majority of cases this is the better approach, since it minimizes the number of random disk IO operations which result. Random IOs slows down query performance.

Also, are you testing using tables which have a realistic/live-like number of rows? Testing on small tables do not always show real-world results.


> Now trying to change the all joins into left outer joins
>
> SELECT
> T.TransNo, T.TransDate, T.VoucherNo,
> T.PayerNo, P.PayerNumber,
> T.PaymentNo, Pm.PaymentName,
> T.SupplementText,
> T.PayTypeNo, Pt.PayTypeName,
> T.DebitAmount, T.CreditAmount,
> T.StudentNo, S.StudentNumber
> FROM Trans T
> LEFT OUTER JOIN Student S ON T.StudentNo = S.StudentNo LEFT OUTER JOIN
> Payer P ON T.PayerNo = P.PayerNo LEFT OUTER JOIN Payment Pm ON
> T.PaymentNo = Pm.PaymentNo LEFT OUTER JOIN PayType Pt ON
> T.PayTypeNo = Pt.PayTypeNo ORDER BY TransDate
>
> Gives me the plan I'm after:
>
> PLAN JOIN (JOIN (JOIN (JOIN (T ORDER TRANS_DATE_IX, S INDEX
> (STUDENT_PK)), P INDEX (PAYER_PK)), PM INDEX (PAYMENT_PK)), PT INDEX
> (PAYTYPE_PK))
>
> And the query runs 10 times faster! Since the relations I changed are secured
> by referential integrity, there should be no difference (?) so I'm a surprised,
> that the optimizer couldn't figure that out.

You need to be careful when trying to benchmark/compare query results. There are several caches (FB Engine and OS) which can significantly skew results and make a query look faster than it is in reality -- the state of the caches when in production usage will not be the same as during your testing.

Personally, I always use the Classic engine when I am benchmarking (and disconnect/reconnect in between query runs) to get a truer picture of query performance -- the default engine cache size is much smaller than other engines, so it minimizes the 'benefit' of the engine cache, thus providing more of a 'worse case' usage scenario.


Sean