Subject RE: [firebird-support] Left Join Issue - conceptual
Author Svein Erling Tysvær
Hi Alex!

Generally, what Anderson write is correct and your attempt no. 2 and 3 are preferable. I wouldn't call the choice of PLAN position based in general, but for some reason, Firebird treats everything after a LEFT JOIN (probably similar for RIGHT and FULL JOIN) as something that cannot be moved above that part. I don't know why, but I have written SQL like

FROM A
LEFT JOIN B on A.ID = B.ID
LEFT JOIN C on A.ID = C.ID
JOIN D on D.ID = COALESCE(B.ID_D, C.ID_D)

and can accept that it can be considerably more difficult for the optimizer to separate inner joins that are dependent upon preceding left joins from inner joins that aren't, than it is for me to put inner joins that really can be inner joined before my left joins (I only have to consider my actual SQL, Firebird has to cater for all possible SQL).

To your particular problem. The optimizer sometimes do mistakes, and sometimes they just look stupid to those of us that know nothing about the Firebird code. Somehow, the optimizer thinks that it is better to go NATURAL on TBL_PARTY and then use the index three times for TBL_PAYMENT than to use indexes for both tables. I don't know why, I can only guess that either this index isn't very selective for one of these tables (in general, not talking specifically about the two PARTY_ID's you select) or your statistics are very out of date. The plan I would have expected, would be something like

PLAN JOIN (JOIN (P INDEX (PAYMENT_PARTY,PAYMENT_PARTY), PARTY INDEX((PARTY_KEY)),PERSON INDEX (PERSON_KEY))

Whenever I observe the optimizer doing a really bad choice, I normally prevent the optimizer from choosing the bad plan by using +0 or ||'' I think your use of LEFT JOIN in your first query - albeit accidentally - works in a similar manner.

There is one thing that puzzles me and that I do not understand at all, and that is that matters whether you write 'p.party_id in' or 'party.party_id in'. My experience says that when these two tables are inner joined on that column, then Firebird itself knows how to substitute these to get the best plan. You do not by any chance have several indexes covering the same column (e.g. if party_id was both the primary key and explicitly indexed for tbl_party)? Firebird is known to make stupid plans when there are two identical indexes to choose from.

Set

-----Original Message-----
From: firebird-support@yahoogroups.com [mailto:firebird-support@yahoogroups.com] On Behalf Of Alexander Tabakov
Sent: 2. juni 2008 15:48
To: firebird-support@yahoogroups.com
Subject: [firebird-support] Left Join Issue - conceptual

Note: Sorry for hijacking the other thread. My mistake!
=======================================================

Hi all,

I run a simple query on Firebird 1.5.5:

select *
from
tbl_party party left join tbl_person person on person.party_id =
party.party_id
join tbl_payment p on p.party_id = party.party_id
where
party.party_id in (xxxx, yyyy) // Two parties

All fields are indexed and here is the query plan:
PLAN JOIN (JOIN (PARTY INDEX (PARTY_KEY,PARTY_KEY),PERSON INDEX
(PERSON_KEY)),P INDEX (PAYMENT_PARTY))

Execution is just fine - 68 rows in the result set with the following
details:
tbl_party - 2 indexed reads
tbl_person - 2 indexed reads
tbl_payment - 68 indexed reads

So fas so good. What bothers me is that when I rewrite the query as follows:

select *
from
tbl_party party join tbl_payment p on p.party_id = party.party_id
--> left join tbl_person person on person.party_id = party.party_id
where
party.party_id in (xxxx, yyyy) // Two parties

I get a huge execution penalty while results are the same:

PLAN JOIN (JOIN (PARTY INDEX (PARTY_KEY,PARTY_KEY),P INDEX
(PAYMENT_PARTY)),PERSON INDEX (PERSON_KEY))

tbl_party - 2 indexed reads
tbl_person - 68 indexed reads
tbl_payment - 68 indexed reads

Is result set creation somewhat "position based" depending on the actual
join sequence?
If so, is it true for left/right joins only? Is there any rule of thumb
where to place the left joins?

When I modify the former query as follows:

select *
from
tbl_party party join tbl_payment p on p.party_id = party.party_id
left join tbl_person person on person.party_id = party.party_id
where
p.party_id in (xxxx, yyyy) // Two parties

Results become scary :)

PLAN JOIN (JOIN (PARTY NATURAL,P INDEX
(PAYMENT_PARTY,PAYMENT_PARTY,PAYMENT_PARTY)),PERSON INDEX (PERSON_KEY))

tbl_party - 456'000 indexed reads
tbl_person - 68 indexed reads
tbl_payment - 68 indexed reads

Each of these queries return the same result set and essentially operate
on the same set of data uses one and the same keys, etc.

I would definitely like to here your comments, suggestions, etc.

Best regards and thanks in advance
Alex