Subject | Re: [firebird-support] Re: Valid reasons for sorting on a column one doesn't select? - a BUG! |
---|---|
Author | Helen Borrie |
Post date | 2005-08-14T12:20:50Z |
At 11:09 AM 14/08/2005 +0000, you wrote:
The standard declares the question of *which* row is taken as "the one" in
a DISTINCT set as implementation dependent. So - where Oracle and Firebird
might happen to take the first one encountered (taken in no particular
order) and discard the rest, another engine might process the candidates
differently, discarding all but the last. Since it is impossible to perform
a positioned update on a DISTINCT cursor, it's irrelevant in terms of output.
Considering that the entire set must be walked in order to determine
distinctness, it would make sense for the optimizer to do the sort first,
in an attempt to shorten the journey. And that's just as it appears to be.
I just happen to have a database handy for testing this: it's the member
database of the Firebird Foundation.
If I do (with no order by or explicit group by)
select distinct first_names from person
I get an *ordered list* of the first_names. Under the hood, one has to
suspect that the optimizer does a GROUP BY first_names to get the distinctness.
However, if I do
select distinct first_names from person
order by surname
I do get the list in surname order - BUT
-- in the database are two people called Dmitry. One has the surname
Yemanov, the other Kouzmenko. If the records in this case were being taken
as "first come, first served, I would expect Yemanov to be selected and
Kouzmenko to be discarded, because Yemanov's unique id is 16, while
Kouzmenko's is 135. That is, the Yemanov record has the lower cardinality
in the table.
Yet, in the output list, we find Dmitry sandwiched between Alexander
(Klenin) and Marco Antonio (Kregar). Klenin "wins" in the same way. We
have two Alexanders, with A. Nevsky's record having a lower cardinality
than A. Klenin's.
The deduction is that Firebird's implementation rule works along the lines
of..."use GROUP BY unless there is an ORDER BY clause". If the GROUP BY
logic *is* the way Firebird processes DISTINCT, then it's just as obvious
that it must abandon GROUP BY if there is an ORDER BY column that is not in
the SELECT list -- otherwise, the engine would have to break its own
grouping rules.
However, it's a stretch to call it a "bug" to allow the query which, though
silly, does no harm. Ordering is logically *possible*, even if's not
sensible to anticipate that it will be meaningful to most people, and it
breaks nothing. If someone wants to do something like that, why should the
engine prevent them?
./heLen
>Firstly, thanks to all those who gave meNowhere. It isn't a bug, it's an opinion. :-)
>valid reasons why one might want to sort
>on a column not present in the select.
>
>However, on one of the Oracle groups, one
>thoughtful individual explained to me why allowing
>this construct is a bug.
>
>
> > SELECT DISTINCT(First_Name)
> > FROM Employee
> > ORDER BY Last_Name
>
>
>------
>Since John can only show up once in the output, is
>it at the top of the sorted list (John Andrews),
>in the middle (John Meany), or at the end (John
>White)?
>---------
>
>
>I had stumbled onto this myself, wondering about
>which last name got eliminated in the sort, and
>it doesn't appear to be well defined. I think
>that Firebird should *_NOT_* allow ORDER BY's on
>aggregate functions in this manner.
>
>SELECT DISTINCT(First_Name)
>FROM Employee
>ORDER BY First_Name
>
>is a valid query, and works in both FB and Oracle.
>
>But allowing an order by on a separate field
>in an aggregate simply doesn't make sense.
>
>
>Where should I report this bug in Firebird?
The standard declares the question of *which* row is taken as "the one" in
a DISTINCT set as implementation dependent. So - where Oracle and Firebird
might happen to take the first one encountered (taken in no particular
order) and discard the rest, another engine might process the candidates
differently, discarding all but the last. Since it is impossible to perform
a positioned update on a DISTINCT cursor, it's irrelevant in terms of output.
Considering that the entire set must be walked in order to determine
distinctness, it would make sense for the optimizer to do the sort first,
in an attempt to shorten the journey. And that's just as it appears to be.
I just happen to have a database handy for testing this: it's the member
database of the Firebird Foundation.
If I do (with no order by or explicit group by)
select distinct first_names from person
I get an *ordered list* of the first_names. Under the hood, one has to
suspect that the optimizer does a GROUP BY first_names to get the distinctness.
However, if I do
select distinct first_names from person
order by surname
I do get the list in surname order - BUT
-- in the database are two people called Dmitry. One has the surname
Yemanov, the other Kouzmenko. If the records in this case were being taken
as "first come, first served, I would expect Yemanov to be selected and
Kouzmenko to be discarded, because Yemanov's unique id is 16, while
Kouzmenko's is 135. That is, the Yemanov record has the lower cardinality
in the table.
Yet, in the output list, we find Dmitry sandwiched between Alexander
(Klenin) and Marco Antonio (Kregar). Klenin "wins" in the same way. We
have two Alexanders, with A. Nevsky's record having a lower cardinality
than A. Klenin's.
The deduction is that Firebird's implementation rule works along the lines
of..."use GROUP BY unless there is an ORDER BY clause". If the GROUP BY
logic *is* the way Firebird processes DISTINCT, then it's just as obvious
that it must abandon GROUP BY if there is an ORDER BY column that is not in
the SELECT list -- otherwise, the engine would have to break its own
grouping rules.
However, it's a stretch to call it a "bug" to allow the query which, though
silly, does no harm. Ordering is logically *possible*, even if's not
sensible to anticipate that it will be meaningful to most people, and it
breaks nothing. If someone wants to do something like that, why should the
engine prevent them?
./heLen