Subject RE: [IB-Architect] Data clustering
Author Jim Starkey
At 10:03 PM 6/28/01 -0700, David Schnepper wrote:
>
>There is lots of value in localizing data --
>
>For instance, a few months back I made a change to a 63M row table
>(um, Oracle)
>

With a traditional index design, an engine bounced between
index pages and data pages. If the data pages are randomized,
the cache hit rate goes down and physical reads up.

The JRD index scheme first walks the index, then walks the resultant
records in bitmap order, which is physical order within the table,
guaranteeing a cache hit on second and subsequence data records.

A system like Oracle dies without clustered indexes. However, the
proper comparison is not Oracle with and without a clustered index,
but Oracle with a clustered index against JRD.

Among the many problems of clustered indexes is the absolutely
requirement that the page space be chosen correctly. Too big
and the records are sparse, causing many otherwise unnecessary
page reads. Too small and pages overflow causing a mess.

Performance numbers for clustered indexes are almost always taken
on a carefully designed and freshly loaded database. Real life
numbers don't approach them.

Long experience with the foibles of DEC's RMS isam system lead
me to the conclusion that there had to be a better way. There
is and Rdb/Eln (aka JRD), Interbase/Firebird (aka JRD), and
Netfrastructure (aka Netfrastructure) all have it.

Before somebody does something really dumb with regard to
clustered indexes, do the numbers first.

>
>Note: I have this vague memory that GBAK makes it's own "system table"
>GBAK$something ? Scratch that, I'm thinking QLI$something. But why
>not have GBAK store options in a "gbak system" table ? This would
>save from the "lost script" problem -- and be compatible with existing
>metadata.
>

Didn't use to.

I like flexibility. Introducing Gbak (aka burp) control language
would also allow select backup of certain tables, some table just
as meta data, and some tables fully populated. Probably make it
good for a zillion useful things.

While we're on the subject, the integrated Netfrastructure restore
class gives three options on restore:

1. Replace. All existing records are deleted before restore.
2. Override. Any existing record (by primary key) will be
replaced, non-duplicate records from the file are added
to the table, and other existing records are retained.
3. Merge. Like override, but precedence is given to existing
records over record from the backup.

This assumes an incremental restore, which gbak doesn't support
but could.



Jim Starkey