Subject Re: [firebird-support] MS SQL Vs. Firebird Again
Author Ann W. Harrison
>David Johnson wrote:
> > Count, at worst, _should_ do an index space scan on the primary key
> > rather than a table space scan.

Multi-generational concurrency, as implemented in Firebird,
prevents counts from using an index rather than the table
space. Two simultaneous transactions can have different
views of the same database and different counts of various
sections of tables. The consistency of a transaction is
maintained with transaction id's which are not available
in the index. The only way to verify that an index entry
indicates a record that is committed and appropriate to a
particular transaction is to read the associated record.

"Well," you say, "Why not put the transaction id in the
index?" Two reasons - it increases the size of index entries
considerably, makes all accesses slower, and triples the
number of page writes to update a record.


Suppose you've got some record - call it Sam - which
has a key value that has been changed four times by
transactions 3, 5, 7, and 9. 3 stored the record with a
key value of 'a'. 5 changed the value to 'b'. 7 changed
it back to 'a'. 9 changed it to 'f'. All transactions
committed sequentially.

Under the current index management scheme, there would be
three index entries pointing to Sam - 'a', 'b', and 'f'.
Note that the two instances of 'a' are represented by a
single index entry. Adding transaction ids to the record
versions would require a second 'a' entry.

That may sound trivial, but most key fields are stable
so most records have only one entry in each index. Having
to update each index each time a record is modified would
significantly increase the I/O load.

Suppose you're transaction 8 and you're looking for 'a's.
You'd find two, both pointing to Sam. Ok, fine, so you
add a filter to your count that eliminates duplicate references
to the same record.

Suppose you're transaction 6 and you're looking for 'a's.
You find the one created by transaction 3. There are no
duplicates, so you keep it. The fact that the actual value
that your transaction would see is 'b' never enters the
picture.

OK, so keep both the id of the transaction that created the
version and the id of the transaction that obsoleted it.
That's bigger, sure, and will require even more I/O as two
entries are modified for every index at ever modification
of a record, and entries must be made when a record is
deleted.

If counts are a semantic part of your application, then keep
them as logical parts of the schema rather than relying on
physical artifacts. Counts can be maintained deadlock free -
there are explanations in the mers archive on ibphoenix.com

> > Min and max should be able to operate
> > equally well against ascending and descending indeces. There typically
> > is only one or two more I/O (10 to 20 milliseconds) involved in locating
> > the last entry in a balanced index versus the first entry. These are
> > obvious optimizations, and when I have time I will dig into the optimizer
> > and identify the places to correct this, I will post the suggested change
> > back here.

Be slightly careful of your assumptions here. The index structure
is dynamic and can not be locked. Walking down the left hand side
of the tree is reasonably easy because there's an "absolute 0" node
at the beginning of each level. Walking down the right hand side
is more difficult because between the time you pick up the right
hand pointer to the next level down and the time that you get that
page, the page may have split. In a split the old page is to the
left of the new page (part of the reason walking the left is easier).

Suppose you get down to the leaf level, find a target max value,
read the data and discover that the record is not appropriate for
your transaction - too old, too new, or rolled back. In getting
the record, you've lost your lock on the leaf page and have to
start again at the top.

The ODS was designed to be deadlock free. Deadlock free structures
can not loop in the pointer graph. Although indexes now contain
left as well as right pointers, the left pointers are not guaranteed
to work. The second rule of being deadlock free is to hold no more
that one page lock at any time. Perhaps you see why this is hard?

At 02:29 PM 2/1/2004, Alexandre Benson Smith wrote:

>I though that because FB indexes are compressed (RLE Encoded AFAIR) one
>could not traverse on the reverse way.

It's not RLE, it's prefix and suffix compression. Trailing spaces
in text keys and trailing zeros are compressed in numeric (double
precision) keys. The prefix compression eliminates all leading
bytes that duplicate the preceding key value.

Prefix compression is not performed on the first node on each
page. You can't walk a single page backward, but you can - or
could absent concurrency issues - walk the leaf level backward.

>Do we still needs compressed indexes ?
>What we have:
>1.) More entries per index page (very good)
>2.) Index uses less space (disk space should be treated as high priority
>nowadays ?)

It's not space, its the I/O load. More entries per page -> fewer
reads to get an index range.

>An option on create index statement to not compress could be a choice ?

No - compression is not the reason why you can't walk the right side
of an index. Concurrent activity is the reason.

>The index could be stored compressed on disk-data-page, but once read the
>cached-page could store the uncompressed version of the same page ? If so,
>once those pages are in memory could we use the index on both directions ?

As above. That's not the problem

> David Johnson again.
> > In this case, may also be identifying a minor weakness in the
> > optimizer. Without changing the basic design premise, Count(*) should
> > run against the index with the smallest footprint that is guaranteed to
> > include pointers to all rows rather than the table space itself.

As above, not true.

Alexandre Benson Smith wrote:

>I have asked once why FB does not use just the index when some select asks
>just for indexed columns like:
>
>select name from person where name like 'Ale%'

In that case, it will use an index to locate all candidate records,
but must then read each of the candidates to insure that it does
actually fall into the set of records available to the transaction.

>I thought we could have diferent kinds of tables (memory table for
>example), indexes (compressed, uncompressed, b*tree, hash, clustered (a la
>MSSQL), etcs...).

One of the major design goals for Firebird was simplicity and part
of simplicity is finding the best way to work with a limited set of
tuning parameters. In your list above, the only thing I agree with
is memory tables, and even then, I'm not at all sure they're necessary.
I see no benefit to uncompressed indexes. Bitmap index access greatly
reduces the need for clustered indexes without introducing the need
for the database designer to see into the future when defining his
table structures. And bitmap index access works for all indexes, not
just the primary key. Hash indexes are fine in memory, but an untuned
hash index is worse than no index at all.

Regards,


Ann