Subject RE: [firebird-support] Indexes and optimising queries
Author Svein Erling Tysvær
> Hello all,

Hi Randall!

> First off, does the order in which fields are defined within an index matter to
> the query optimizer? For example in my table TableX with columns Col1, Col2, and
> Col3. Do these indexs differ as far as the optimizer is concerned:
>
> CREATE INDEX IDX_TEST1 ON TABLEX (COL1, COL2) vs
>
> CREATE INDEX IDX_TEST1 ON TABLEX (COL2, COL1)

Yes, the order matters. Firebird can choose to use the index for only the first fields within it. Though another thing is that Firebird can use several indexes for the same table, so you could just create two separate indexes. It would typically be slightly slower than a combined index, but since my queries rarely pushes Firebird to its limits, I find single-field indexes easier to read in a plan, and multi-field indexes may require creation of a lot of indexes to fit various needs, I tend to use single-field indexes myself. Though I'm not saying multi-field indexes are useless, they can be very useful in some situations.

> Again with the above index defs, if I'm doing a JOIN to TableX on Col1 - will
> either index be used? neither index is used?

The optimizer may choose to use the index on (COL1, COL2), whereas the index on (COL2, COL1) is useless unless COL2 is referred to in a WHERE or JOIN clause.

> Next Q...
> We never allow users to access tables directly. They always access views. For
> the sake of simplicity, let us say that there is a date column which is checked
> within the definition of the view:
>
> Create View MyView(.....)
> select * From MyTable
> where DateCol > [someDate]
>
> New let's say we are doing a join with the above MyView. Let's say it is
> something like this:
>
> Select x.Col1, x.Col2, x.Col3,
> v.*
> From TABLEX x
> JOIN MyView v on (v.someCol = x.someCol)
>
> If I wanted to optimize the above, do I want an index on:
> 1) v.someCol
> 2) x.SomeCol
> 3) v.DateCol
> 4) v.someCol and v.DateCol
> 5) Some other combination of above

First, I rarely use views, so my answer may be incomplete.

I don't think there is a definitive answer to this question, it all depends on your situation. I'm pretty certain I would have an index on either of the someCol fields, and if someDate was higher than the vast majority of the entries in MyTable, then I suppose indexing x.someCol and v.DateCol would be the first thing for me to try, in the hope that a plan similar to:

PLAN JOIN(V INDEX(DateColIndex), X INDEX(someColIndex))

could be used and be quick.

> Last Q...
> Is it the case that the Query Optimizer will use an index, even if the
> statistics for that index are very poor? Let's assume the stats are .5 (about as
> bad as they can be). Still it cuts the number of entities roughly in half
> (assuming the column data is split 50/50). So does it use this index? Or are
> their cases where the overhead of using an index means that performance will go
> backward and hence opts to do a natural join instead?

I don't know for sure, but would expect the optimizer to use the index. There are cases where such an index could be useful (column data split 99/1) and the optimizer doesn't get such information from the indexes at prepare time. I'd by far prefer the optimizer to believe I have reasons for creating the index (and use it), rather than just ignore the index I've created. The one exception to this would be tables containing only a handful of records, but I normally don't care whether the optimizer use an index or not on tables containing less than 10 or 20 records.

There are of course situations when I prefer indexes not to be used, if a selective index is already

HTH,
Set