Subject | RE: [firebird-support] question about optimizing group by |
---|---|
Author | Svein Erling Tysvær |
Post date | 2008-11-21T10:37:28Z |
The Firebird optimizer tries to decide how to get the desired output at the lowest cost. It does so by analyzing which indexes can be used to find the desired rows and avoid looking at those that aren't interesting. Generally, the most interesting part from a performance point of view (and hence the optimizer), is the WHERE and JOIN clauses. Sometimes, indexes are also used for sorting, but that isn't equally important. I haven't actually checked to see whether Firebird can use indexes for GROUP BY, but I don't think it would make a great difference performance-wise.
Your query has neither a WHERE nor a JOIN clause, so it has to look at all records in the table anyway. The main difference between these two statements:
select t.pri_key, t.other_field
from t
group by t.pri_key, t.other_field
and
select t.pri_key, max(t.other_field)
from t
group by t.pri_key
I'd say is that the latter (potentially) returns fewer records. Theoretically, t.otherfield is also used from two different perspectives, in the former case every distinct value for each t.pri_key is shown, in the latter case it is using an aggregate function (like sum, avg, min).
If pri_key is a unique field and you don't join to any other table that may result in this returning two rows, then I don't understand why using GROUP BY at all, you would get the same result by using ORDER BY (due to how Firebird currently uses GROUP BY, apart from that it could break existing applications, there's no guarantee that new versions of Firebird will return GROUP BY sorted).
Generally, to avoid indexes from being used for a particular field, adding +0 or ||'' can be useful in your queries.
Have I answered your question, or didn't I quite understand what you were asking?
Set
-----Original Message-----
From: firebird-support@yahoogroups.com [mailto:firebird-support@yahoogroups.com] On Behalf Of woodsmailbox
Sent: 21. november 2008 00:42
To: firebird-support@yahoogroups.com
Subject: [firebird-support] question about optimizing group by
Hi,
Question: does firebird optimizer knows to optimize something like:
select t.pri_key, t.other_field
from t
group by t.pri_key, t.other_field
in the sense that it should ignore t.other_field from the sorting
table?
I have seen advice to use max(t.other_field) to avoid grouping by
t.other_field.
In that case, wouldn't it help to have an aggregate function that
would just choose the first row in the group? min/max still needs an
index.
Is there any other workaround for this? Thanx!
Your query has neither a WHERE nor a JOIN clause, so it has to look at all records in the table anyway. The main difference between these two statements:
select t.pri_key, t.other_field
from t
group by t.pri_key, t.other_field
and
select t.pri_key, max(t.other_field)
from t
group by t.pri_key
I'd say is that the latter (potentially) returns fewer records. Theoretically, t.otherfield is also used from two different perspectives, in the former case every distinct value for each t.pri_key is shown, in the latter case it is using an aggregate function (like sum, avg, min).
If pri_key is a unique field and you don't join to any other table that may result in this returning two rows, then I don't understand why using GROUP BY at all, you would get the same result by using ORDER BY (due to how Firebird currently uses GROUP BY, apart from that it could break existing applications, there's no guarantee that new versions of Firebird will return GROUP BY sorted).
Generally, to avoid indexes from being used for a particular field, adding +0 or ||'' can be useful in your queries.
Have I answered your question, or didn't I quite understand what you were asking?
Set
-----Original Message-----
From: firebird-support@yahoogroups.com [mailto:firebird-support@yahoogroups.com] On Behalf Of woodsmailbox
Sent: 21. november 2008 00:42
To: firebird-support@yahoogroups.com
Subject: [firebird-support] question about optimizing group by
Hi,
Question: does firebird optimizer knows to optimize something like:
select t.pri_key, t.other_field
from t
group by t.pri_key, t.other_field
in the sense that it should ignore t.other_field from the sorting
table?
I have seen advice to use max(t.other_field) to avoid grouping by
t.other_field.
In that case, wouldn't it help to have an aggregate function that
would just choose the first row in the group? min/max still needs an
index.
Is there any other workaround for this? Thanx!