Subject Re: [IBO] Join criteria in the WHERE clause
Author Helen Borrie
At 07:23 PM 10-01-01 -0600, you wrote:
>Helen,
>
> > >Here is an example to help clarify:
> > >
> > >SELECT *
> > >FROM NAME_ADDRESS N
> > >LEFT JOIN BILLING_DATA B
> > > ON B.CO = N.CO
> > > AND B.YR = N.YR
> > > AND B.PARCEL = N.PARCEL
> > >ORDER BY N.CO, N.YR, N.PARCEL
> > >
> > >Here is the resulting plan:
> > >PLAN SORT (JOIN (N NATURAL,B INDEX (BILLING_DATA_00)))
> >
> > Specify it without "LEFT" and look at the plan then.
>
>Without the LEFT it is very fast, however I only get records contained in
>both tables. What I need is all records in N even if there is not a match in
>B (like the join in the WHERE clause produces).

No, your example with the join in the WHERE clause is an inner join. There
is no difference (in logical terms) between LEFT, RIGHT and FULL for
implicit joins.


>I tried that and it did not appear to make a difference. Now that I know
>what JoinLinks are used for, I can live with the join criteria in the WHERE
>clause.

That's fine; as long as you realise that you aren't gaining anything and
would have some degree of performance cost from the extra parsing on
searches, for example...


> > fwiw, you could eliminate the ambiguities confronting the optimizer if you
> > normalized on that CO,YR,PARCEL key structure. If it is that prevalent
> > throughout the database, it really ought to be in its own table with a
> > surrogate key which the other tables use as a link entity. Still, maybe
> > you don't want to go there...
>
>I am not sure that I understand you. The CO, YR and PARCEL fields are
>basically the primary keys for each table. At login the user selects a
>community and a year (i.e. CO and YR). Once logged in, the user works with
>parcel related information for the logged in community and year. If I moved
>CO, YR and PARCEL to a separate table and use a surrogate key instead, would
>all my select statements then become even more complex joins ?
>
>Example:
>
>SELECT KY
>FROM KEYS K
>LEFT JOIN NAME_ADDRESS N
> ON K.KY = N.KY
>LEFT JOIN BILLING_DATA B
> ON K.KY = B.KY
>WHERE K.CO = '241'
> AND K.YR = 2001
>ORDER BY K.CO, K.YR, K.PARCEL
>
>I am a firm believer in a normalized database. However, I typically do not
>normalize the primary keys. Would the above example really be more efficient
>for the optimizer ?

They would eliminate ambiguity and thus make it possible for the optimizer
to use the index for sorts. Now that you tell me the composite key is also
the primary key, I can tell you right away why you got optimizer ambiguity
with the join.

You have two identical indexes in the N table - the one associated with the
primary key and the named index that was used in the second plan.

Perhaps the thing you don't clearly see is that indices are not keys. The
purpose of indices is to enable optimization. The purpose of keys is to
form relationships. It so happens that InterBase creates an index for
every primary key automatically (it also does so for explicit FOREIGN
keys). Because IB sees two identical indexes, it has an unresolvable
ambiguity - it can't tell which index is "better" so it uses NATURAL for
the sort order. Get rid of that named index and you will find you have
cured the problem.

I can't think what you would use a query like your example for, though. To
go back to your original example, here is how the surrogate key would form
the links:

SELECT *
FROM NAME_ADDRESS N
LEFT JOIN BILLING_DATA B
ON N.KY = B.KY
WHERE N.CO='Something' AND N.YR=2000 <-- this is the rule for datasets -
limit them by WHERE criteria
ORDER BY N.CO, N.YR, N.PARCEL

will be very fast if you have an index on those three fields. KY is a key
- it doesn't (and shouldn't, for atomicity) care about the content of the
fields it stands in for. It makes the join over one pair of linked columns
instead of three. I think your confusion over keys vs. indexes is
obscuring things for you. A normalized structure is aimed at eliminating
the dependence of set definitions on keys - it's ever pushing toward the
most efficient way of limiting sets, which is the WHERE clause.

>This sounds like a very interesting idea and would certainly reduce the
>overall size of my database. In most cases I need the data in PARCEL order
>for the logged in community & year and I just want to be sure that the
>additional joins are not going to put even more of a burden on the server.

This is something you need to prototype for yourself. In general, the
cleaner the relationships, the more efficiently you can increase the number
of tables. Think of it in terms of Velcro versus buttons. If you need 5
buttons to do the work that one piece of Velcro can do and you have to wear
5 garments that have to be fastened on to you before you run for your bus,
you probably will choose to use Velcro most of the time...

Scuse the dressmaking analogy...I also use dressmaking principles to do BA
and application design. <g>

H.


All for Open and Open for All
InterBase Developer Initiative ยท http://www.interbase2000.org
_______________________________________________________