Subject Re: [firebird-support] Query optimization help
Author Helen Borrie
At 10:07 PM 17/04/2005 +0200, you wrote:

>Aage Johansen wrote:
>
> > I don't see it too often - maybe I take these points into consideration as
> > I write SQL statments ... I think one must accept that optimisert aren't
> > infallible.
>
>Yes, of course. It's just that this is a frequent problem for me. In a
>commercial product I would complain to the manufacturer. Loudly.

And the manufacturer would simply come back to you with advice to avoid
indexes of low selectivity (like e.g. FK's on status fields, type fields,
Yes/No fields, et al.) (A commercial manufacturer would change you for
this advice...) These problems are specific to the way indexes are
maintained by this engine (and the vendor has addressed them in the next
version).

It's more often than not a legacy from file-based data management systems
like Paradox, MySQL and Access, which maintain indexes in files, not trees,
and so don't care about selectivity, but they do need to use indexes to
achieve what the better, purpose-designed SQL engines actually have
designed into them. A related source of faults is from apps that were
written on the assumption that all DBMSs maintain indexes in files and need
to use them for all of the the same things.

With Fb, currently, there is an intrinsic problem with having a mandatory
index on foreign key columns, when many index entries reference few primary
key values. Often, it will get used (not ideal), even if you have a better
index targeting that column. If it chooses not to use any index -- as it
can do when it is confused by a choice -- then often you should be
grateful. This kind of mess is common when you import Paradox's
hierarchical keys into Firebird.

For an example of the kind of indexing that is messing you up, the
single-column index on an Account Type field in a table of transactions
will make queries on that table abominable if that column is searched or
ordered on, because of the long duplicate chains formed as the table
grows. A composite index on that column + a low-selective column, such as
an int PK, will avoid this dup chaining and help the query's
performance. In a query where the FK index is chosen over the "good index"
you can step in with a PLAN clause and force the optimiser to use the good one.

Another thing to consider is how seriously you need that FK
relationship. If it's just for validation, you can avoid defining the FK
altogether and do the validation in a trigger instead. In situations where
the "lookup table" is very stable, you tend to trade off a lot of
performance for little or no gain in RI protection. If you feel you must
have that RI protection and "action effects", again it can be done with
triggers.

Note that we're not talking here about *all* FK relationships, only those
that reference very small lookup sets, or that reference larger lookup sets
but, in practice, utilise only a small number of the actual possible
values. My classic example here was an electoral application that
collected "Country of birth" in the elector profile. This was linked to
all of the countries in the world (around 370 at the time) but 85% of the
electors were born in the country that was collecting and storing the
data!! Country codes at the time were not changing much, so, given the
distribution of the data, this FK wasn't justified.

There is idealism (pure relational theory) and there is pragmatism (taking
on board real-world issues like data distribution and the degree to which
an engine can implement one ideal without compromising other ideals too
much). Designing well-performing structures is a matter of balancing
ideals (which are not concerned with performance) with reality (which
usually is!)

./hb