Subject Re: [ib-support] Table size and indexing
Author Ann W. Harrison
At 08:17 AM 6/19/2001 +0800, Andy Canfield wrote:

>Suppose I have two tables:
> CREATE TABLE Alpha (
> AlphaIdent INTEGER NOT NULL PRIMARY KEY,
> AlphaName VARCHAR(30));
> CREATE TABLE Beta (
> BetaIdent INTEGER NOT NULL PRIMARY KEY,
> AlphaIdent INTEGER REFERENCES Alpha,
> etc... );
>Now as far as I know there are two indexes; an index for table Alpha on
>the values in Alpha.AlphaIdent and an index for table Beta on the values
>in Beta.BetaIdent.

There is also an index on Beta.AlphaIdent, and that is the index that
causes the problem.

>Your discussion seems to imply that there is another index on table Beta
>on values from column Beta.AlphaIdent.

And so there is.

>You talk about selectivity being low. If by selectivity you mean the ratio
>of the number of distinct values to the number of table rows, then the
>selectivity of a primary key index is always one hundred percent since
>primary keys must be unique.

The selectivity on the index Beta.AlphaIdent depends on the ratio of the
sizes of Alpha and Beta. If Alpha has 10 values and Beta has 100.000,
the average value in the Beta.AlphaIdent index will have 10.000 duplicates.
That gets to be a problem in two cases, select and garbage collect.

The select problem occurs because InterBase tries to make use of
all applicable indexes, and isn't smart enough about ignoring indexes
with very poor selectivity. As many of you know, an indexed retrieval
in InterBase is done in two stages. First, the engine examines the
indexes, building a bit-map of qualifying records. It then combines
the bit-maps using logical ands and ors. The result is a bit map of
records that satisfy all indexed conditions, ordered by data page.
The second phase is reading those records and applying any unindexed
conditions.

The two-phase index look-up is one reason why InterBase provides
performance without explicit placement of indexes and data. At
no time is it switching back and forth reading one index entry,
then reading one record, then back to the index, and back to the
records... never, unless you've asked for an index walk in place
of a sort.

The reason that a non-selective index is a problem is that it builds
a large bit map - then discards nearly all the rows in it when it
is combined with other bitmaps. A colossal waste of time.

The garbage collection problem occurs after the key value is modified or
the row is deleted. Typically, the problem is deletion and typically,
older rows are deleted before newer rows. When a duplicate key value
is stored in an index, it goes at the head of the chain of duplicates.
Thus, the oldest rows are at the end, the newest at the beginning.
InterBase has to read all the duplicates until it finds the one that
corresponds to the entry being removed. This is known as thrashing
the cache.

At the moment, you can not delete the index that is automatically
created on Beta.AlphaIdent as part of the foreign key declaration.
The only solution is to use triggers rather than foreign keys to
enforce consistency in such cases.


Regards,

Ann
www.ibphoenix.com
We have answers.