Subject Re: [Firebird-Architect] Clustered indexes
Author Ann W. Harrison
Thomas Steinmaurer wrote:
>>> what about support for clustered index's?
>>... the first answer is "over my dead body." If anyone is
>>interested in more specifics - I'll be glad to continue the discussion.
> Yes, I would be interested, if you don't mind.
First, Dmitry's explanation really covers the positive side of the issue:

> Why do you think we need them? Unlike other databases, our index
> scan produces an optimized bitmap, so data pages are always read
> in their storage order. And we cannot avoid looking at data pages
> anyway, as we're required to chase the record versions chain.

For the down side, you have to consider the different things that are
sometimes called clustering.

The simplest form puts the record data at the bottom (aka leaf level) of
the index, called data in indexes.

A more complex form stores records on data pages, but attempts to store
records in key order, called clustered records.

Even without multi-generational records, each of these has serious

The data-in-index model makes the index leaf level enormously wide,
leading to serious problems of depth, and a very slow index.
Furthermore, indexes with large data segments lead to frequent page
splits which are complicated and slow.

The clustered-record model requires that the designer accurately
understand the distribution of data - leaving enough space to avoid
overflow but not leaving so much space that pages are half full. It
also complicates reuse of space because free space can only be
reallocated if the page is completely empty, or if you have a record
that fits in its previously assigned value range.

Now, lets add a multi-generational record to the mix. Where do you put
the back versions? How do you find them?

So, the additions to "over my dead body" are

1) that we have a mechanism described by Dmitry that is efficient for
all indexes, not just a clustered index

and 2) that clustered indexes work badly unless very well tuned in
non-multi-generational database, and even worse in a multi-generational