Subject Re: [ib-support] Foreign keys and low selectivity
Author Helen Borrie
At 05:32 PM 26-08-02 +0930, you wrote:
>In reply to a question in another group I read:
>"If the lookup table has very few rows compared to the main tables that use
>it, then don't apply a formal FOREIGN KEY constraint to the main tables
>referencing it unless you want really rotten performance. (This is known
>as the Low Selectivity Problem. <g>)". (Helen Borrie)
>The statement might have been specific to a particular situation, but it
>makes be wonder
>about efficiency vs declarative integrity.
>Suppose there are N records of animals and 20 breed types where the breed
>types are stored
>in a lookup table and connected to the animal table via an integer field.
>Normally I would place a foreign key constraint on animal into breed type
>to ensure my
>programming doesn't store an invalid number in the animal table.
>There are 3 types of uses for this connection.
>1. Display animal details including their breed type.
>2. Select animal records based on selected breed type or types.
>3. Filter on breed type the animal records that have already been selected
>into a grid.
>For which of these 3 types of use and for what values of N is performance
>enhanced by not
>declaring a foreign key?
>Typically N will be between 500 and 100,000 animals.

20 keys shouldn't cause too much alarm if they are pointed to by only 500
rows. 20 could be marginal with 100,000, especially if most of the animals
are of only one or two breeds. It won't be too much of a problem if the
animals are evenly spread across all 20. Where it will really bite is in
the situation where a table of accounting transactions (large) is keylinked
to at least one (often more) lookup table of "type" lookups - account type,
transaction type, product type, etc.

What causes this problem is that a FK causes a mandatory index to be
created over the constrained column. As records are added, long chains of
duplicates get built up in the index, cause index reads to spread over many
database pages.

The general rule is NOT to have single-column indexes on columns of low
selectivity. It will take longer to traverse the index than to find the
values by a natural order search - so you are better off if this index
doesn't exist. At present, it's not possible to disable the index that
enforces the FK.

By all means index such columns, but make it a composite index with a
high-selectivity column, for example the PK if it is an integer. Place the
low-sel column in the first degree (leftmost position) of the index, to
ensure that the optimizer will use it.

Of course, the comp. index can't reference the PK of the lookup table, so
this index can't be used by the foreign key. Enhancements for Firebird
have been discussed - the ability to disable the FK index,
was tried out but it caused a bunch of obscure problems and was
shelved. It has also been mooted to find some way to let the FK use the
"workaround" composite index, though I haven't seen any theories about
whether it would be practicable without some far-reaching changes.