Subject Re: [IBO] Join criteria in the WHERE clause
Author Chris Landowski
Helen,

> >Without the LEFT it is very fast, however I only get records contained in
> >both tables. What I need is all records in N even if there is not a match
in
> >B (like the join in the WHERE clause produces).
>
> No, your example with the join in the WHERE clause is an inner join.
There
> is no difference (in logical terms) between LEFT, RIGHT and FULL for
> implicit joins.

My mistake, I never bothered to check record counts to ensure that the 2
example SQL statements produced the exact same results. I guess the implicit
join in the where clause will not solve my problem since I need a LEFT JOIN
:(

> They would eliminate ambiguity and thus make it possible for the optimizer
> to use the index for sorts. Now that you tell me the composite key is
also
> the primary key, I can tell you right away why you got optimizer ambiguity
> with the join.
>
> You have two identical indexes in the N table - the one associated with
the
> primary key and the named index that was used in the second plan.

Actually, I do not have any primary keys on any tables. I am not using any
referential integrity and I wanted the generated PLANs to be more easily
readable. Seeing RDB$PRIMARY93 in a plan is a lot more confusing then
NAME_ADDRESS_00. I am only using named indexes for optimization purposes.
When I said that CO, YR, PARCEL was the primary key, I should have said that
it was an index that uniquely identified each record (sorry !). Will this
approach come back to haunt me down the road ? Are primary keys used for any
purpose other then RI ?

I have about 10 other indexes on the NAME_ADDRESS table (no duplicates
though). Is it possible that one of these could be causing the unresolvable
ambiguity ? If so, what plan can I use to force the LEFT JOIN to use the
NAME_ADDRESS_00 index instead of NATURAL ?

> I can't think what you would use a query like your example for, though. To
> go back to your original example, here is how the surrogate key would form
> the links:
>
> SELECT *
> FROM NAME_ADDRESS N
> LEFT JOIN BILLING_DATA B
> ON N.KY = B.KY
> WHERE N.CO='Something' AND N.YR=2000 <-- this is the rule for datasets -
> limit them by WHERE criteria
> ORDER BY N.CO, N.YR, N.PARCEL
>
> will be very fast if you have an index on those three fields. KY is a key
> - it doesn't (and shouldn't, for atomicity) care about the content of the
> fields it stands in for. It makes the join over one pair of linked
columns
> instead of three.

I think I understand. Are you saying leave the CO, YR, PARCEL fields in the
tables so that they can be used for a WHERE clause but add a KY field that
links the records together so that it can be used to optimize LEFT JOINs ? I
initially thought you were saying, replace CO, YR, PARCEL with KY in all
tables (i.e. extreme normalization).

Thanks for all your help,

Chris Landowski
Dynamic Software Solutions