Subject Re: [firebird-support] Re: Natural Plan when grouping by 5 indexed columns?
Author Richard Wesley
On Feb 1, 2007, at 13:37, Adam wrote:

> --- In, Richard Wesley <hawkfish@...>
> wrote:
>> On Jan 31, 2007, at 16:28, Adam wrote:
>>> Why do you need to group by unless you have aggregates or a having
>>> clause?
>> Because one is trying to view the dimension structure?
> It does not make sense though.
> select a,b,c
> from d
> group by 1,2,3
> is exactly the same as
> select a,b,c
> from d
> [order by 1,2,3]

I don't think that is correct - the two queries return vastly
different numbers of rows. The first would return 1932 rows; the
second all 54860 rows, with many duplicates.

> The implicit order by of a grouping is simply a Firebird
> implementation artifact. In the absense of a having clause or an
> aggregate, it does not make sense to use group by.

Unless you only want the distinct values. Which is exactly what we
are trying to discover here.

>>> It is quite possible an optimisation noone has considered writing
>>> because it seems at first glance to me to be unusual query.
>> Adding an aggregation (SUM("Gross Profit")) to the WHERE did not
>> change the plan or the speed, so it looks like the "artificiality" of
>> the query is not the problem.
> You are right here, but I think Sean hit the nail on the head. Because
> you do not specify any 'where' conditions, the entire table must be
> read at some time. So the grouping (if one can call it that when there
> is nothing to group, or at least the ordering) can be done in one of
> two ways. Either everything can be read from the disk in storage order
> and sorted in memory, or the index can be read from the disk and the
> records can then be read in the correct order.
> You expect the second behaviour here, but as Sean points out, on
> 'disk' based technology, that will cause thrashing. Instead of reading
> all the data in a single pass like storage order reading does, it
> reads a bit then has to wait for the disk and heads to move to the
> position of the next data (time measured in milliseconds). It can take
> a lot longer to read the table like this, in most cases, longer than
> the total time of a storage order read and in memory sort.

I think my point is that one could just read the indecies in disk
order and skip the actual rows data, reducing the number of reads by
95%. Adding an aggregation would of course require a table scan, and
it looks like FB assumes that this is what one would be doing if one
were grouping. Which is fine, but it is the sort of issue one runs
into when trying to do relational OLAP.
Richard Wesley Senior Software Developer Tableau