Subject Re: [ib-support] Re: Please remove the ambiguity check
Author Claudio Valderrama C.
""Ann W. Harrison"" <aharrison@...> wrote in message
news:5.1.0.14.2.20011211110442.04290b48@......
>
> I was wrong. Claudio is absolutely correct that the ambiguity checking
> is done in DSQL and that all this ruckus has been about that. The engine
> never sees an ambiguous field reference - it can't be expressed in BLR.

I don't think that the conclusion is that you are wrong, but what's your
point of view: DSQL or the core engine. Let's say that IB's proprietary,
high level language is GDML. To me, there's no way to write ambuiguous field
references in GDML, since all fields are qualified and this seems to be
mandatory. So, GDML users probably never had a chance to complain... they
were used to fully qualified fields. The only internal language, BLR, works
with streams, rivers, lagoons (no lagoons no, unless we have a leak <g>) and
other concepts. There are several ways to visualize BLR in a human-readable
way to understand how a procedure or a simple SELECT is expressed.

Now, when SQL was crowned as the widely accepted language (language whose
standardarization is a joke if you compare db engines), it was declarative,
not procedural and didn't demand the users to qualify fields. I wonder what
do other engines with ambiguities. In the case of IB, the people that
developed or enhanced the DSQL layer chose to be "user friendly" and take
away from the user the burden of prefixing fiields with their owning table
or procedure (in the case of selectable stored procedures). It makes sense
for me in the typical case of one table. It starts to be confusing when you
have a sub-select. It becomes a problem when you have an inner join and
turns worse when you have an outer join.

For years, Borland NGs have been plagued by occasional posts from smart
people that realized that changing the order of table names in the FROM
clause could alter totally the result set. I found that the rule is: given
an ambiguous field, IB will pick the last table in the FROM list. This can
spoil totally the intended result from an outer join, for example. Worse,
someone posted that FB was returning different "random" results than IB in
this regard. Initially, the ambiguity check did a warning, but let's face
it, few tools care about warnings. They are poorly documented if documented
at all, including how to detect them.

Ann said that she was prepared to defend that change because the added
nuisance means no accidental wrong results. The rejection of those cases was
made mandatory. No warning, but error. The original code wasn't nice. I
thought I could report to the caller functions the number of times a field
is repeated. Once in the road, the idea proved to be nightmarish. I passed a
pointer to some functions to get back the result, but since there can be
several recusive invocations of the parsing code in the DSQL layer, it was
possible to get a number greater than one without having any ambiguity. The
original idea was to say "there's an ambiguity in the GROUP BY clause", for
example. Since it was abandoned, several calls piggybacked a parameter that
was no longer in use. The new code simply says that there's an ambiguous
field, gives the field name and the tables that share it. If you have a
40-lines SQL statement, good luck!

For anyone keeping an eye in fb-devel, it's evident that the code has to
undergo several modifications to not stop valid statements. At some time
between RC1 and RC2, Ann decided to rewrite the code. It looks better now,
without the legacy parameter that was abandoned. Any code rewriting is
traumatic and that wasn't the exception, just ask Ann. <g> But someone
discovered that the legacy Visigenic driver produces some ambiguous
statements, hence it couldn't query metatada. (I will keep to myself the
comments why someone would want to stick to the Visigenic or the Merant
drivers, when you have Gorham and Karyakin among others, both offering good
ODBC drivers.) So Ann made the ambiguity a warning in dialect 1 but an error
in dialect 3. Without need to say, the legacy ODBC drivers are not-dialect
aware, that in practical terms gets translated into dialect 1. This is how
FB RC2 will work. There's no config param, sorry.

I don't want to play russian roulette with my data. There's no official
documentation that explains which field should be picked up by the engine
when the user doesn't identify it properly and the field happens in more
than one table. When there's no official behavior, there's no guarantee.
Even more, the final effect can vary from version from version. The typical
answer from tech support would be that you can't rely on implementation
details. For example, Ann changed several places that deal with SQL views,
so we can't guarantee that FB would pick the same instance of the ambiguous
field than IB. Personally, I deem worse to have results being spoiled under
the hood than having an error condition that halts my suspicious statement.
Do you remember how multiplication with numeric fields changed silently in
IB even in dialect 1 when compared to IB 5.6?
http://sourceforge.net/tracker/index.php?func=detail&aid=221589&group_id=902
8&atid=109028
Data became spoiled silently, no error about numeric overflow and worse,
with values that IB5.6 handled correctly. People that reported the glitch
surely would have preferred an error raised the first time instead of
mangled data.

I know that this case may be a pain:
select f, f2 from t1 join t2 on t1.f = t2.f
order by f;

What happens when "f" is shared by both tables? Nothing. Since this is an
inner join, whatever field the engine selects does the trick for the
sorting. However, I found the detection of those simple cases to be rather
tricky so they need full qualification anyway if "f" exists both in t1 and
t2. Consider this second example, it may make sense for some cases:

I know that this case may be a pain:
select f, f2 from t1 join t2 on t1.f = t2.f
order by a;

In this case, "a" doesn't appear in the SELECT section, but it may or it may
not exist in both tables. If it's shared by both tables, since it doesn't go
into the inner join's condition, there's no assurance that the result is the
expected, because values in both namesake columns may differ completely.
Hence, dialect 3 will reject the construction.

C.
--
Claudio Valderrama C. - http://www.cvalde.com - http://www.firebirdSql.org
Independent developer
Owner of the Interbase® WebRing