firebird-general - RE: [IBDI] Digest Number 419

Subject	RE: [IBDI] Digest Number 419
Author	Claudio Valderrama C.
Post date	2001-08-26T23:38:15Z

> -----Original Message-----
> From: Syarzhuk Kazachenka [mailto:bamboo7431@...]
> Sent: Domingo 26 de Agosto de 2001 11:03
>
> >The ambiguity here is that the statement does not identify which table's
> >compname to order by and the results are somewhat unpredictable -
> >contradictory, even, if you compare database versions.
>
> I don't get it. If you're doing an equijoin (a.compname =
> b.compname) than
> does it really matter whether the DB engine will use a.compname or
> b.compname for sorting the result set? I understand this could be
> the reason
> for underperformance on the server side, but the RESULTS of the SQL query
> should be the same and not "somewhat unpredictable". Am I missing
> something?

You're right, Syarzhuk. Probably Helen tried to simplify the example. With
such a simple case, only with an INNER JOIN and with one condition, there
will be no much difference. However, detecting this special innocuous case
INSIDE the engine is not easy as it seems to be. Hence, this case will be
either allowed as before or stopped with all the rest, too.

The painful cases happen using left outer joins with sort, group by and
virtually any expression that involves the ambiguous field, like an innocent
cast:

select a.compname from company a
left join contractor c
on a.compname = c.compname
order by compname

Since every company is expected to have a name, sorting by a.compname gives
"right" results, but sorting by c.compname doesn't necessarily render the
recordset in the same order, since for each company that doesn't have a
contractor, c.compname will be null. Hence it's put at the tail.
The problem is the user. The engine has been forgiving for years. We don't
want to forgive users for their ambiguous statements. Sometimes you don't
realize that two tables have the same field name and wonder why your
statement doesn't do what you want, until you discover the cause and exclaim
"it was too easy, why didn't I see it before?".

The example posted by Helen is an old-style, implicit join. It should be
clear with an explicit outer join, let's use employee.gdb:

select a.department, b.dept_no from department a
left join employee b on a.dept_no=b.dept_no
order by dept_no

What does the server do? It sorts by b.dept_no and hence, you see
Field Office: Singapore
Software Products Div.
at the tail. Using
order by a.dept_no
instead will put them where most users expected: in the middle of the list,
because their numbers are 116 & 620 respectively. You will observe same
anomaly with GROUP BY, a CAST and even any innocent comparison like
where fieldA = fieldB
may change its meaning if there're more than one fieldA in the tables that
make up the SELECT statement. See this confusing result:

select a.department, a.dept_no from department a
left join employee b on a.dept_no=b.dept_no
where dept_no is null
order by a.dept_no
DEPARTMENT DEPT_NO
================================
Field Office: Singapore 116
Software Products Div. 620

At first glance, I would shout "I said IS NULL but the DEPT_NO column has
values!". Well, I specified a.dept_no in the output but the WHERE clause
assumed b.dept_no, hence the server really did the right thing (for it)
after picking one of the ambiguous fields: it selected cases where b.dept_no
is null.

I don't think it's necessary to post real-life, complex examples to show the
problem.

C.