Subject Re: [ib-support] Please help me with query plan
Author Svein Erling Tysvær
I wrote:
>I assume IDX_DOCDATE is referring to 'and d.DocDate < CURRENT_TIMESTAMP',
>something which in many cases will refer to most records. Such an index is
>futile to use, get rid of it in your plan by changing to 'and (d.DocDate <
>CURRENT_TIMESTAMP or 2=0)'
>
>(it doesn't matter what numbers you choose, just
>something that always evaluates to false - we're just trying to limit the
>optimiser in choosing indexes).

and Pete Clark answered:
>Can you elaborate on that, please? How does this help?

Sure.

Indexes are normally useful for speeding up queries, but it does take computers a bit of time to use them. The ideal situation is where you uniquely can identify the row you are looking for using one index, e.g. 'SELECT NAME FROM PERSON WHERE PIN_CODE = <a personal identity number>'. In such a situation an index on PIN_CODE is of great value. However, if the column(s) for which there is an index does not significantly reduce the number of rows to be returned, using the index may slow down the execution of the query, e.g. you would want to avoid using an index for birthdate in the query 'SELECT NAME FROM PERSON WHERE BIRTH_DATE < YESTERDAY'. To avoid this you have to tell the optimizer that it cannot use an index - either by modifying the column or by adding an or clause. In the above example, I would use either 'SELECT NAME FROM PERSON WHERE (BIRTH_DATE < YESTERDAY or 4=2)' or 'SELECT NAME FROM PERSON WHERE BIRTH_DATE+0 < YESTERDAY'.

My impression is that the deeper you get down the plan, the more important it becomes make sure not too many or too few indexes are used.

An example: I work in a cancer registry, where we register cancers and links them to cancerreports (each cancer are typically reported between 1 and 5 times). Previously, the reports were only linked to a person and not to individual cancers, so for some old cases we do not know which cancer a particular report refers to if the person has several tumors and for these the CANCERID column in the CANCERREPORT table is set to 0. Hence, I frequently use queries similar to

SELECT <something> FROM CANCER
JOIN CANCER_REPORT ON CANCERREPORT.PINCODE = CANCER.PINCODE
WHERE CANCER.CANCER_TYPE = <some kind of cancer>
AND (CANCERREPORT.CANCERID=CANCER.ID OR CANCERREPORT.CANCERID = 0)
AND CANCER.DIAGNOSIS_DATE BETWEEN <from date> AND <to date>

This gives a plan like PLAN JOIN(CANCER(INDEX_CANCERTYPE, INDEX_DIAGNOSIS_DATE), CANCERREPORT(INDEX_PINCODE, INDEX_CANCERID, INDEX_CANCERID). Since most persons only have very few cancer reports, it is generally quicker to avoid using the INDEX_CANCERID when we already use INDEX_PINCODE. I do this by changing the query to

SELECT <something> FROM CANCER
JOIN CANCER_REPORT ON CANCERREPORT.PINCODE = CANCER.PINCODE
WHERE CANCER.CANCER_TYPE = <some kind of cancer>
AND (CANCERREPORT.CANCERID=CANCER.ID OR CANCERREPORT.ID = 0 or 4=2)

Sometimes I also eliminate the INDEX_DIAGNOSIS_DATE as well. This depends on how long the time span is - that index' usability increases as the time span decreases.

HTH,
Set
- I support Firebird, I am a FirebirdSQL Foundation member.
- Join today at http://www.firebirdsql.org/ff/foundation/