Subject Re: [firebird-support] Re: outer join help
Author Svein Erling Tysvaer
Hi Ed!

> totally agree, but I've never seen an join that is easy to read, I
> generally stay with joins in the where clause.

Reading this, I was tempted to answer that I've never seen a join that
was difficult to read, but thinking about it for one second made me
realise that to be very wrong. Though I have never seen a query that
cannot be equally easy or easier to read with a JOIN that with joining
in the WHERE clause.

OUTER JOINS are more complex than inner joins, but it isn't immediately
visible:

select C.Client_ID, PE.Client_ID
from clientinfo C
left join PersonInfo PE
on C.Client_ID = PE.Client_ID

is equivalent to

select C.Client_ID, PE.Client_ID
from clientinfo C, PersonInfo PE
where C.Client_ID = PE.Client_ID
union
select C.Client_ID, NULL
from clientinfo C2
where not exists(select * from PersonInfo PE2 where PE2.Client_ID =
C2.Client_ID)

He00nce, although some people (like you) prefer sql-89 for inner joins,
I would expect them to agree with me that sql-92 is better for outer joins.

Let's get back to your original sql:

> select Count(*)
> from clientinfo C
> left join PersonInfo PE
> on C.Client_ID = PE.Client_ID
> full join PaycheckInfo Pay on PE.Person_ID = Pay.Person_ID
> where PE.person_type_id = 0
> and Pay.sequence = 1

This IS a very confusing block of sql. You want the rows of C regardless
of whether they match any row in PE, whereas from PE you want only the
rows that match C.Client_ID. Then you want all rows of Pay, joined to PE
if applicable. So far, it is probably OK - though I've never seen that
kind of left join/full join combination before.

But then you say that you only want to return those tuples that have
particular values for PE and PAY, contradicting what you just said that
you wanted C regardless of value of PE. Hence, I think the above
statement is identical to:

select Count(*)
from clientinfo C
join PersonInfo PE
on C.Client_ID = PE.Client_ID
join PaycheckInfo Pay on PE.Person_ID = Pay.Person_ID
where PE.person_type_id = 0
and Pay.sequence = 1

or (sql-89):

select Count(*)
from clientinfo C, PersonInfo PE, PaycheckInfo Pay
where C.Client_ID = PE.Client_ID
and PE.Person_ID = Pay.Person_ID
and PE.person_type_id = 0
and Pay.sequence = 1

What I think you're looking for, is this:

select <fields> //probably more sensible than count(*)
from clientinfo C
left join PersonInfo PE
on PE.Client_ID = C.Client_ID
and PE.person_type_id = 0
left join PaycheckInfo Pay
on Pay.Person_ID = PE.Person_ID
and Pay.sequence = 1

In your plan you have two NATURAL, that is always time consuming (one is
often OK).

HTH,
Set

Ed Dressel wrote:
0>> Not like that! :-)
>
> That's why I asked :-)
>
>> Tip: unless you have to save white space, it is a REALLY good idea
>> to lay out your multi-table statements in a way that makes the
>> linkages obvious. Even if you don't care, you *might* retire some
>> day and bequeath your code to someone else. :-)
>
> totally agree, but I've never seen an join that is easy to read, I
> generally stay with joins in the where clause.
>
> I got it to work, kinda. Here's my SQL:
>
> select Count(*)
> from clientinfo C
> left join PersonInfo PE
> on C.Client_ID = PE.Client_ID
> full join PaycheckInfo Pay on PE.Person_ID = Pay.Person_ID
> where PE.person_type_id = 0
> and Pay.sequence = 1
>
> It returns 2,681 records--and takes 110 seconds to run. If I change it to:
>
> select Count(*)
> from clientinfo C
> left join PersonInfo PE
> on C.Client_ID = PE.Client_ID
> where PE.person_type_id = 0
>
> it returns 2,744 records and only takes 31 ms to run.
>
> Questions:
>
> (1) why doesn't it return the same number of records? That is what I
> need. (I need a field value from the PaycheckInfo table for every
> record in ClientInfo, but the record in PersonInfo and in PaycheckInfo
> may not exist)
>
> (2) The plan for the first one is
>
> Plan
> PLAN JOIN (PAY NATURAL,JOIN (C NATURAL,PE INDEX (PERSONINFO_CT1)))
>
> Adapted Plan
> PLAN JOIN (PAY NATURAL,JOIN (C NATURAL,PE INDEX (PERSONINFO_CT1)))
>
> What causes it to take so long? Paycheck.Client_Id is a FK,
> (Person_ID; Sequence) is indexed. I added an index to just Sequence
> but that did not speed anything up. What should I look at?
>
> Thanks