Subject | Re: [IBO] Join criteria in the WHERE clause |
---|---|
Author | Helen Borrie |
Post date | 2001-01-11T02:34:01Z |
At 07:23 PM 10-01-01 -0600, you wrote:
is no difference (in logical terms) between LEFT, RIGHT and FULL for
implicit joins.
would have some degree of performance cost from the extra parsing on
searches, for example...
to use the index for sorts. Now that you tell me the composite key is also
the primary key, I can tell you right away why you got optimizer ambiguity
with the join.
You have two identical indexes in the N table - the one associated with the
primary key and the named index that was used in the second plan.
Perhaps the thing you don't clearly see is that indices are not keys. The
purpose of indices is to enable optimization. The purpose of keys is to
form relationships. It so happens that InterBase creates an index for
every primary key automatically (it also does so for explicit FOREIGN
keys). Because IB sees two identical indexes, it has an unresolvable
ambiguity - it can't tell which index is "better" so it uses NATURAL for
the sort order. Get rid of that named index and you will find you have
cured the problem.
I can't think what you would use a query like your example for, though. To
go back to your original example, here is how the surrogate key would form
the links:
SELECT *
FROM NAME_ADDRESS N
LEFT JOIN BILLING_DATA B
ON N.KY = B.KY
WHERE N.CO='Something' AND N.YR=2000 <-- this is the rule for datasets -
limit them by WHERE criteria
ORDER BY N.CO, N.YR, N.PARCEL
will be very fast if you have an index on those three fields. KY is a key
- it doesn't (and shouldn't, for atomicity) care about the content of the
fields it stands in for. It makes the join over one pair of linked columns
instead of three. I think your confusion over keys vs. indexes is
obscuring things for you. A normalized structure is aimed at eliminating
the dependence of set definitions on keys - it's ever pushing toward the
most efficient way of limiting sets, which is the WHERE clause.
cleaner the relationships, the more efficiently you can increase the number
of tables. Think of it in terms of Velcro versus buttons. If you need 5
buttons to do the work that one piece of Velcro can do and you have to wear
5 garments that have to be fastened on to you before you run for your bus,
you probably will choose to use Velcro most of the time...
Scuse the dressmaking analogy...I also use dressmaking principles to do BA
and application design. <g>
H.
All for Open and Open for All
InterBase Developer Initiative ยท http://www.interbase2000.org
_______________________________________________________
>Helen,No, your example with the join in the WHERE clause is an inner join. There
>
> > >Here is an example to help clarify:
> > >
> > >SELECT *
> > >FROM NAME_ADDRESS N
> > >LEFT JOIN BILLING_DATA B
> > > ON B.CO = N.CO
> > > AND B.YR = N.YR
> > > AND B.PARCEL = N.PARCEL
> > >ORDER BY N.CO, N.YR, N.PARCEL
> > >
> > >Here is the resulting plan:
> > >PLAN SORT (JOIN (N NATURAL,B INDEX (BILLING_DATA_00)))
> >
> > Specify it without "LEFT" and look at the plan then.
>
>Without the LEFT it is very fast, however I only get records contained in
>both tables. What I need is all records in N even if there is not a match in
>B (like the join in the WHERE clause produces).
is no difference (in logical terms) between LEFT, RIGHT and FULL for
implicit joins.
>I tried that and it did not appear to make a difference. Now that I knowThat's fine; as long as you realise that you aren't gaining anything and
>what JoinLinks are used for, I can live with the join criteria in the WHERE
>clause.
would have some degree of performance cost from the extra parsing on
searches, for example...
> > fwiw, you could eliminate the ambiguities confronting the optimizer if youThey would eliminate ambiguity and thus make it possible for the optimizer
> > normalized on that CO,YR,PARCEL key structure. If it is that prevalent
> > throughout the database, it really ought to be in its own table with a
> > surrogate key which the other tables use as a link entity. Still, maybe
> > you don't want to go there...
>
>I am not sure that I understand you. The CO, YR and PARCEL fields are
>basically the primary keys for each table. At login the user selects a
>community and a year (i.e. CO and YR). Once logged in, the user works with
>parcel related information for the logged in community and year. If I moved
>CO, YR and PARCEL to a separate table and use a surrogate key instead, would
>all my select statements then become even more complex joins ?
>
>Example:
>
>SELECT KY
>FROM KEYS K
>LEFT JOIN NAME_ADDRESS N
> ON K.KY = N.KY
>LEFT JOIN BILLING_DATA B
> ON K.KY = B.KY
>WHERE K.CO = '241'
> AND K.YR = 2001
>ORDER BY K.CO, K.YR, K.PARCEL
>
>I am a firm believer in a normalized database. However, I typically do not
>normalize the primary keys. Would the above example really be more efficient
>for the optimizer ?
to use the index for sorts. Now that you tell me the composite key is also
the primary key, I can tell you right away why you got optimizer ambiguity
with the join.
You have two identical indexes in the N table - the one associated with the
primary key and the named index that was used in the second plan.
Perhaps the thing you don't clearly see is that indices are not keys. The
purpose of indices is to enable optimization. The purpose of keys is to
form relationships. It so happens that InterBase creates an index for
every primary key automatically (it also does so for explicit FOREIGN
keys). Because IB sees two identical indexes, it has an unresolvable
ambiguity - it can't tell which index is "better" so it uses NATURAL for
the sort order. Get rid of that named index and you will find you have
cured the problem.
I can't think what you would use a query like your example for, though. To
go back to your original example, here is how the surrogate key would form
the links:
SELECT *
FROM NAME_ADDRESS N
LEFT JOIN BILLING_DATA B
ON N.KY = B.KY
WHERE N.CO='Something' AND N.YR=2000 <-- this is the rule for datasets -
limit them by WHERE criteria
ORDER BY N.CO, N.YR, N.PARCEL
will be very fast if you have an index on those three fields. KY is a key
- it doesn't (and shouldn't, for atomicity) care about the content of the
fields it stands in for. It makes the join over one pair of linked columns
instead of three. I think your confusion over keys vs. indexes is
obscuring things for you. A normalized structure is aimed at eliminating
the dependence of set definitions on keys - it's ever pushing toward the
most efficient way of limiting sets, which is the WHERE clause.
>This sounds like a very interesting idea and would certainly reduce theThis is something you need to prototype for yourself. In general, the
>overall size of my database. In most cases I need the data in PARCEL order
>for the logged in community & year and I just want to be sure that the
>additional joins are not going to put even more of a burden on the server.
cleaner the relationships, the more efficiently you can increase the number
of tables. Think of it in terms of Velcro versus buttons. If you need 5
buttons to do the work that one piece of Velcro can do and you have to wear
5 garments that have to be fastened on to you before you run for your bus,
you probably will choose to use Velcro most of the time...
Scuse the dressmaking analogy...I also use dressmaking principles to do BA
and application design. <g>
H.
All for Open and Open for All
InterBase Developer Initiative ยท http://www.interbase2000.org
_______________________________________________________