Subject Re: [firebird-support] Why does DISTINCT does not use the index in this simple case?
Author Ann Harrison
On Wed, May 16, 2012 at 5:05 AM, Marc Bettex <bettex@...> wrote:

>
> I'm using Firebird super server 2.5.1 on windows 7. I've got a database
> with a single table with one column filled with 250000 rows. This single
> column is the primary key, which implies that it is indexed and unique and
> contains ids from 000001 to 250000.
>
> The statement "SELECT id FROM table" takes around 0ms to complete.
>

When you say "complete" do you mean that ti takes 0ms to get the first
record, or that it takes 0ms to get the whole result set. There is a
difference. An index will always be the fastest way to get the first row,
but may be measurably slower than a table scan plus a sort when you're
returning a large number of record. Index scans tend to jump from page to
page, reading a record here and a record there, while a table scan reads
each page only once.


> The statement "SELECT id FROM table GROUP BY id" also takes around 0ms to
> complete. The plan for this query uses the index of the primary key.
>

Again, do you mean the time to the first record or the time to the last?


> The statement "SELECT DISTINCT id FROM table" takes around 500ms to
> complete. The plan for this query does not use the index of the primary key.
>

Do you store your records in id order? You probably do, and that makes
the indexed access faster because it does not leap from page to page
randomly. It also makes the sort slower - the Firebird sort algorithm is
not very good when data arrives sorted.

>
> I've seen this post
> http://tech.groups.yahoo.com/group/firebird-support/message/100051 which
> is quite similar, but I did not really understand the answers. Someone said
> to make the database in the 3rd normal form but I don't really see how
> database normalization is related to DISTINCT.


Probably not a factor here.


> Two other people said that rows might have different values for different
> transactions, and thus the index cannot be used.


They were confused. You can't use an index to count rows without looking
at the data because a single row may have several index entries,
representing values from different versions of the record. But Firebird
can certainly use an index for grouping, provided it also checks that the
record versions themselves are appropriate for the running transaction.

> But if that's the reason, then how could the index be valid for the GROUP
> BY clause, or any other part of any other SQL query?
>
> If I'm mistaken regarding the answers of the other post, could someone
> give me an explanation? Does anybody have an idea of why Firebird does not
> use the index in my simple query and if it is possible to make Firebird use
> it?
>

Maybe you can force a plan, but solving the simple case will not solve the
problem of your real query.



> I know that the last request is stupid, but my rather complex request that
> I'm trying to optimize also takes around 500ms on 250000 rows of data and
> is something like



> SELECT DISTINCT id, col1, col2, col3, [...]

FROM table1, table2, table3, [...]

WHERE [...]

ORDER BY [...] ROWS x TO y.



> If I remove the DISTINCT from my query, the execution time drops to 1ms.
> So it seems to me that the bottleneck is the DISTINCT and if I can't
> optimize this simple query, I don't see how it would be possible to
> optimize my more complex query.
>

You've got two separate sorts in that query, one to find the distinct rows,
and one to put the rows in order so you can pick a subset wit the ROWS x TO
y. When using an index for navigation, Firebird can use only one
index(see below). So it must choose between using all the indexes that can
help reduces the intermediate product based on the WHERE clause, or walking
a single index to match the ORDER BY clause. It will never be able to walk
an index that matches the DISTINCT clause. Maybe with a complete
description of the query and the table and index definitions, someone will
be able to suggest a better way to express the query.

Good luck,

Ann


Normally, Firebird uses indexes to select data rather than to order it.
When operating in that mode, it creates a bitmap of record numbers that
match in one index, then if there is another index that can be used, it
creates a bitmap of those record numbers, and so on for all indexes that
can provide significant subsetting of the data. Firebird then combines the
bitmaps and retrieves the record in record numbers order.


[Non-text portions of this message have been removed]