Subject Re: How to Speed up query containing Distinct
Author hay77772000
Hi Helen,

Thanks for the reply! I guess Firebird's indexes work differently
to SAP's!!

Have done what you said - removed all indexes on metrics table, and
then added single indexes on exectime and copies.

But, I am puzzled by what I see...

If I run the original query again, I get the same results:

SELECT distinct kiosk_id FROM Metrics WHERE ExecTime IS NOT NULL
AND Copies IS NOT NULL

=> PLAN SORT ((METRICS NATURAL))

If I try the following query, it does use the index:

SELECT distinct kiosk_id FROM Metrics WHERE ExecTime
= '09/11/2003 16:36:17:0000'

=> PLAN SORT ((METRICS INDEX (M_EXECTIME)))

However, if I then add a where clause on Copies, it doesn't seem to
include that second index:

SELECT distinct kiosk_id FROM Metrics WHERE ExecTime
= '09/11/2003 16:36:17:0000' AND copies='1'

=> PLAN SORT ((METRICS INDEX (M_EXECTIME)))

So, some questions:

- why does Firebird not use the index when comparing if NULL?

- when using 2 columns in the where clause and specifying values,
does Firebird use the second individual index but not show it?

- am I correct in my understanding that I only need to index on
columns that are in the WHERE clause, not columns that are being
selected?

Many thanks,

David

--- In firebird-support@yahoogroups.com, Helen Borrie <helebor@t...>
wrote:
> At 09:09 PM 23/09/2003 +0000, you wrote:
> >Hi,
> >
> >Is there any way to speed up this query:
> >
> >SELECT DISTINCT kiosk_id
> >FROM Metrics WHERE ExecTime IS NOT NULL AND
> >Copies IS NOT NULL
> >
> >The Metrics table has about 500,000 records in it,
> >and an index
> >which includes the kiosk_id, ExecTime and Copies. I would have
> >expected it to use this, but the plan that the Analyzer shows is a
> >sort:
>
> You have two indexes involving kiosk_id. Which one would the
optimizer use?
>
>
> >PLAN SORT ((METRICS NATURAL))
> >
> >The full definition of the table and indexes is below.
> >
> >Many thanks,
> >
> >David
> >
> >CREATE TABLE metrics (Kiosk_Id INTEGER, Form_Id INTEGER, Copies
> >INTEGER, Pgs INTEGER, Script_Id INTEGER, ExecTime TIMESTAMP, Id
> >INTEGER NOT NULL, Primary Key(Id));
> >
> >CREATE GENERATOR GEN_METRICS_ID;
> >
> >SET TERM ^ ;
> >
> >CREATE TRIGGER METRICS_BI FOR METRICS
> >ACTIVE BEFORE INSERT POSITION 0
> >AS
> >BEGIN
> >IF (NEW.ID IS NULL) THEN
> > NEW.ID = GEN_ID(GEN_METRICS_ID, 1);
> >END ^
> >commit ^
> >SET TERM ; ^
> >
> >
> >CREATE INDEX m_kioskId_formId_exectime ON metrics (Kiosk_Id,
> >Form_Id, ExecTime);
> >
> >CREATE INDEX m_kioskId_exectime_copies ON metrics (Kiosk_Id,
> >ExecTime, Copies);
> >
> >CREATE INDEX m_formId_exectime_copies ON metrics (Form_Id,
ExecTime,
> >Copies);
> >
> >CREATE INDEX m_scriptId_exectime_copies ON metrics (Script_Id,
> >ExecTime, Copies);
>
> Kill all of these indexes and replace them with single-column
indexes on
> any columns which actually NEED to be indexed. If any of these
columns is
> a foreign key, don't index it at all, since it already has an
index.
>
> As far as Kiosk_ID is concerned, if its selectivity is low (few
distinct
> occurrences across the entire table) then don't give it a single-
column
> index. Natural order will be faster. Or, index (Kiosk_ID,
ExecTime).
>
> OTOH, if Kiosk_ID has higher selectivity (thin distribution across
the
> table) then use a parameterised WHERE clause, e.g. on ExecTime, to
filter
> the output set to a realistic size. A client can't do anything
useful with
> tens or hundreds of thousands of records.
>
> Incidentally, given that (it seems) Kiosk_ID needs to be in every
record
> (at least for this query), I wonder why it is nullable.
>
> heLen