Subject | Re: Natural Plan when grouping by 5 indexed columns? |
---|---|
Author | Adam |
Post date | 2007-02-01T21:37:36Z |
--- In firebird-support@yahoogroups.com, Richard Wesley <hawkfish@...>
wrote:
select a,b,c
from d
group by 1,2,3
is exactly the same as
select a,b,c
from d
[order by 1,2,3]
The implicit order by of a grouping is simply a Firebird
implementation artifact. In the absense of a having clause or an
aggregate, it does not make sense to use group by.
you do not specify any 'where' conditions, the entire table must be
read at some time. So the grouping (if one can call it that when there
is nothing to group, or at least the ordering) can be done in one of
two ways. Either everything can be read from the disk in storage order
and sorted in memory, or the index can be read from the disk and the
records can then be read in the correct order.
You expect the second behaviour here, but as Sean points out, on
'disk' based technology, that will cause thrashing. Instead of reading
all the data in a single pass like storage order reading does, it
reads a bit then has to wait for the disk and heads to move to the
position of the next data (time measured in milliseconds). It can take
a lot longer to read the table like this, in most cases, longer than
the total time of a storage order read and in memory sort.
Adam
wrote:
>It does not make sense though.
>
> On Jan 31, 2007, at 16:28, Adam wrote:
>
> > Why do you need to group by unless you have aggregates or a having
> > clause?
>
> Because one is trying to view the dimension structure?
select a,b,c
from d
group by 1,2,3
is exactly the same as
select a,b,c
from d
[order by 1,2,3]
The implicit order by of a grouping is simply a Firebird
implementation artifact. In the absense of a having clause or an
aggregate, it does not make sense to use group by.
>You are right here, but I think Sean hit the nail on the head. Because
> > It is quite possible an optimisation noone has considered writing
> > because it seems at first glance to me to be unusual query.
>
> Adding an aggregation (SUM("Gross Profit")) to the WHERE did not
> change the plan or the speed, so it looks like the "artificiality" of
> the query is not the problem.
you do not specify any 'where' conditions, the entire table must be
read at some time. So the grouping (if one can call it that when there
is nothing to group, or at least the ordering) can be done in one of
two ways. Either everything can be read from the disk in storage order
and sorted in memory, or the index can be read from the disk and the
records can then be read in the correct order.
You expect the second behaviour here, but as Sean points out, on
'disk' based technology, that will cause thrashing. Instead of reading
all the data in a single pass like storage order reading does, it
reads a bit then has to wait for the disk and heads to move to the
position of the next data (time measured in milliseconds). It can take
a lot longer to read the table like this, in most cases, longer than
the total time of a storage order read and in memory sort.
Adam