Subject Re: [IB-Architect] Foreign Key indexes
Author Jim Starkey
At 02:54 AM 4/7/00 +0800, Joseph Alba wrote:
>>What do you mean by dense indexes? How do they solve the problem?
>>
>I used the term dense, to contrast it from sparse. Like dense is the antonym
>of sparse, right?
>
>(Also, I was thinking in terms of sparse index - where one pointer to one
>record is kept for a key value, and duplicates are traversed in a linked
>list fashion relative to that first record, versus a dense index where
>duplicates do not have to be traversed in a linked list fashion, because
>there is a pointer to (almost) every record / key combination (thus dense)
>but is much larger than sparse index in size. Not that this sparse/dense is
>applicable to this discussion. But it's just to explain the use of the word
>dense.)
>
>But the main thing is,
>1. Is it not the case that the present Indexing system (sparse) has a worst
>case problem with multiple duplicate keys? I'm thinking in terms of
>algorithm analysis, where some algorithms have a very big gap between its
>best case vis-a-vis its worst case. And its like this with the IB Index
>algorithm - very good with its best case (unique keys with no selectivity
>problems), and very poor with its worst case (keys with many many duplicate
>values - which can occur in the real world with implementing referential
>integrity constraints with foreign keys, where the referenced table size is
>very small compared to the 'referencing' table). Example: implementing a
>referential integrity constraint between a maillist table with millions of
>records to a zipcode table.
>
>2. Would it be possible to conceive of an alternative index system, which
>we'll label as dense - to contrast it with sparse, which will have a pretty
>constant performance (a very small gap between its best case and worst case
>performance), which developers can all on, if they know that their table
>keys will have selectivity problems. (Honestly, I'm thinking of XBase index
>performance where seek performance is pretty constant).
>
>One consideration is, IB is quite an old database system. (There is no
>negative element in the label old, and I love IB). The hardware
>considerations which existed then, like rattling slow disk head movements
>and small hard disk size, might not be such a big deal anymore at this
>stage, given the advances in hard disk technologies. So, the index
>algorithms which might not be applicable at that time might be applicable
>now.
>
>So, the main question / vision is:
>
>Is it possible for IB to have another index system, in addition to the
>present one, so that indexes won't have selectivity issues and would be more
>forgiving to the poor database designers who might have no idea at all what
>'selectivity' means?
>

To answer you questions back to front:

Yes, it is possible to have several index types. This ability has
been used several times to phase in variations in index implementation
without forcing an incompatible ODS change (in all cases, the old
scheme was dropped at the next major ODS change).

The selectivity issues are related to optimizer estimates of
stream cardinality. The problem is that although selectivity
is easy (but not cheap) to computer, maintaining accurate selectivity
is both difficult and expensive. Even maintaining table side is
difficult and expensive, especially in the classic architecture.
[If you wish, I'll explain the reasons for this.] Given inaccurate
cardinality and selectivity, it is obviously impossible for the
optimizer to compute an accurate retrieval cost for every join
order permutation.

I don't believe that are any dense/sparse index scan issues with
the current scheme. I realize that you do, but you don't seem
to know how the current scheme works and you haven't suggested
an alternative, so it is rather difficult to know if you have a
serious suggestion or are just blowing smoke.

Disks haven't changed as much as you might think. Density has
gone through the ceiling, of course, but that relates only to
transfer rate, which has never been a significant factor. The
two significant factors are rotational delay and seek time. The
2311 drive that shipped on IBM 360s in 1965 was probably 2400 rpm.
Current fast drives are 7200 rpm. That's a factor of three. The
seek times are probably comparable given that the power required
to move the actuator goes up with the fourth power of the speed.
The problem with modern small disk is not power, per se, but the
inability of the drive to disipate heat. So modern disks are
probably somewhere between two and four times faster that museum
relics. On the other hand, a 750 MHz is probably a thousand times
the speed of the 360/30 that drove the 2311. As seen from the
processor, the disk a hundreds of times slower than in days of
yore.



Jim Starkey