Subject Why does DISTINCT does not use the index in this simple case?
Author Marc Bettex
Hi everybody,

I'm trying to optimize a complex query, and by doing so, I found something that surprised me and I would like to ask if anybody have an explanation or some insight of what's going on here.

I'm using Firebird super server 2.5.1 on windows 7. I've got a database with a single table with one column filled with 250000 rows. This single column is the primary key, which implies that it is indexed and unique and contains ids from 000001 to 250000.

The statement "SELECT id FROM table" takes around 0ms to complete.
The statement "SELECT id FROM table GROUP BY id" also takes around 0ms to complete. The plan for this query uses the index of the primary key.
The statement "SELECT DISTINCT id FROM table" takes around 500ms to complete. The plan for this query does not use the index of the primary key.

I know that the last request is stupid, but my rather complex request that I'm trying to optimize also takes around 500ms on 250000 rows of data and is something like SELECT DISTINCT id, col1, col2, col3, [...] FROM table1, table2, table3, [...] WHERE [...] ORDER BY [...] ROWS x TO y. If I remove the DISTINCT from my query, the execution time drops to 1ms. So it seems to me that the bottleneck is the DISTINCT and if I can't optimize this simple query, I don't see how it would be possible to optimize my more complex query.

I've seen this post http://tech.groups.yahoo.com/group/firebird-support/message/100051 which is quite similar, but I did not really understand the answers. Someone said to make the database in the 3rd normal form but I don't really see how database normalization is related to DISTINCT. Two other people said that rows might have different values for different transactions, and thus the index cannot be used. But if that's the reason, then how could the index be valid for the GROUP BY clause, or any other part of any other SQL query?

If I'm mistaken regarding the answers of the other post, could someone give me an explanation? Does anybody have an idea of why Firebird does not use the index in my simple query and if it is possible to make Firebird use it?

Thanks a lot in advance and have a nice day,

Marc