Subject RE: [ib-support] query plan
Author Pavel Cisar
Hi,

On 30 Apr 2003 at 16:55, Svein Erling Tysvaer wrote:

> Huh?

:-)

> Why must this query use natural for table NS? Table NS belongs to the bunch
> of tables at the left side of the join and so does the where clause. I
> would then think that the leftmost table of the plan could be one of the
> tables referenced in the where clause. I agree there isn't anything to do
> with the table to the right of the left join, but that table does not
> appear to be a problem. Are you saying that the plan would change more than
> just eliminating the table to the right of the left join if this table was
> removed from the query?

If you have more than two tables in join, they are processed in pairs,
because you can join only two tables at a time. As I wrote earlier, join
is performed as a loop over one (left in PLAN) set with lookup for
relevant rows from right set for each row from left. A product of join is
then joined with another table and so on. You can express the join in
oriented graph where sets are nodes, and connection lines mean joins. It
doesn't really matter in what order you write the join, optimizer can
change the order. Basically, optimizer is looking for:

a) How big are all sets involved (tables and intermediate products). Any
where condition that narrows the set is considered, but optimizer doesn't
work with precise values, but with best guess. Optimizer prefers path
that use lookup in smallest sets and with smallest intermediate products.

b) How tables could be joined (the lookup method). Faster method is
preferred.

Here are few examples that demonstrate the principle (I used tables from
sample EMPLOYEE.GDB):

JOB = 31 rows
COUNTRY = 14 rows
DEPARTMENT = 21 rows

>SELECT *FROM JOB J JOIN COUNTRY C ON J.JOB_COUNTRY =C.COUNTRY ;

PLAN JOIN (JOB NATURAL,COUNTRY INDEX (RDB$PRIMARY1))

It's an inner join, so it's possible to loop over either table with
lookup in the other. Because there is no where condition, server must
evaluate all rows from join control table (leftmost), so you'll see
NATURAL on left table). Although JOB is more than twice bigger than
COUNTRY, optimizer favour ultrafast lookup using PK.

If you'll change the join order in statement, plan doesn't change as
conditions do not change:

>SELECT *FROM COUNTRY C JOIN JOB J ON C.COUNTRY =J.JOB_COUNTRY ;

PLAN JOIN (JOB NATURAL,COUNTRY INDEX (RDB$PRIMARY1))

But very similar query may look different:

SELECT * FROM DEPARTMENT D JOIN PROJECT P ON D.MNGR_NO =P.TEAM_LEADER ;

PLAN JOIN (D NATURAL,P INDEX (RDB$FOREIGN13))

This time, optimizer favour lookup in P because P is smaller - 6 rows,
while D has 21 rows, and there isn't any PK (both join fields are FK).
Note that server have to process 21 rows in one pass and 6 rows filtered
by index for each from these 21.

And now outer join. In this example I used basically the first statement,
but this time with left outer join on country.

>SELECT *FROM COUNTRY C LEFT OUTER JOIN JOB J ON C.COUNTRY =J.JOB_COUNTRY
;

PLAN JOIN (COUNTRY NATURAL,JOB INDEX (MINSALX))

In outer join, server *must* process all relevant rows from COUNTRY, as
lookup in COUNTRY from JOB don't make requested result, but inner join
(there could be countries that doesn't have job entries). So
optimizer have no choice than loop over COUNTRY with lookup in JOB with
use of MINSALX (as a compound index, it has better selectivity than FK
defined on JOB_COUNTRY although it's irrelevant in this case). Because
there isn't any where filter condition on country, it's a natural scan.

BTW, If you're going to Fulda, don't miss my session about "Understanding
Firebird optimizer and execution plan" :-)

Best regards
Pavel Cisar

See you at the First European Firebird Conference in May in Fulda,
Germany
http://www.firebird-conference.com

http://www.ibphoenix.com
For all your upto date Firebird and
InterBase information