firebird-support - Re: [firebird-support] Oldest Snapshot and Performance Issues

Subject	Re: [firebird-support] Oldest Snapshot and Performance Issues
Author	Ann W. Harrison
Post date	2005-10-25T20:07:56Z

Greg Kay wrote:

>
> We are using Classic server 1.52 on a Linux 8 Gb dual CPU opteron 242
> with Windows clients. There are about 110 users with 450 connections
>
> Database header page information:
> Oldest transaction 22951
> Oldest active 319143
> Oldest snapshot 317998
> Next transaction 572981
>
>
> My question is why is the "Oldest snapshot" considerably less than the
> "Oldest active"? My understanding is that the "Oldest snapshot" should
> be close to the "Oldest Active".

Sigh. The serious problem is the difference between Next
transaction and Oldest active. But to answer your question ...

Each transaction has a lock on its own identity. Each lock has
a name, which is the transaction id of the transaction that owns
the lock, and a data space in which it keeps the identity of the
oldest transaction that was running when it started.

When a transaction starts, it queries the the transaction locks
in the lock table, looking for two pieces of information:
1) the identity of the oldest transaction currently running and
2) the identity of the oldest transaction that was running when
any active transaction started.

The identity of the oldest running transaction is the minimum
value of the lock name (for the transaction lock series, but
"lock series" is an advance construct and won't be discussed
until next semester.)

The identity of the oldest transaction running when any active
transaction started is the data from the lock with the lowest
name - or the minimum value of all data values in the transaction
lock series.

The transaction puts the first value (oldest currently running)
in its transaction lock data area. It uses the second value to
determine what old versions can be garbage collected. Anything
older than that value can be, nothing equal to or greater can,
unless it's know to have rolled back.

OK so far? That was the easy part. It was designed for
classic, where each connection has its own process but the
state of transactions has to be shared among processes.

The oldest active is kept individual transaction blocks.
However, new transactions get their transaction number
from the database header page, so the header page must
be written whenever a transaction starts. Since the whole
page is written, and since people often want to know the
state of transaction activity, the oldest active number is
also written on the database header page when a transaction
starts.

And, to make bookkeeping easier for a shared server, the
database block also tracks the oldest active. But for
reasons unclear to me, it tracks both the oldest active,
and the transaction that was oldest active most recently,
which it calls the oldest snapshot. (I think this is a
bug). And that value - oldest snapshot - is also written
into the database header page.

In a shared server, the oldest active and the oldest
snapshot move along pretty smoothly together, barring
some unusual combination of activity - lots of commits
of old transactions just before the next transaction
starts. In Classic, however, the individual processes
have different ideas of what the previous oldest active
(aka Oldest Snapshot) was because transactions start in
different processes, so different processes will have
read the lock table at different times. Given the
number of connections and clients, that probably explains
the difference you see.

>
> Also, assuming we can improve the transaction movement, what else can
> we do to improve performance?

Firebird 2 improves the performance of garbage collection in
two ways. First, indexes are restructured to eliminate the
current ~n*n cost of removing duplicate entries. Second,
the garbage collect thread is invoked only if removing old
versions requires reading additional pages. On-page garbage
collection is done cooperatively - and much faster.

> Based on the recent article on the Lock Manager in
> "The Interbase and Firebird Developer Magazine" we have increased the
> lock manager settings to the maximum with considerable improvement
> (hash lengths have gone from min 700, avg 800, and max 900, to min 20,
> avg 40, or max 60, and the mutex wait is down to 2.5%). Is there
> anything else?
>

Lobby the developers to increase the maximum width of the hash
table.

Regards,

Ann

Note for those anxious to get a head start on next semester -
data that can be lost without corrupting the database is
communicated through the lock table. It's a shared memory
structure and really fast. If the server crashes, losing
the identity of the oldest transaction active at the time
is no big deal, since, by definition, after a crash, there
are no active transactions. Losing the next transaction id
is a problem, so the next transaction id must be written to
stable storage every time it changes. That's why the next
transaction id is in the database header page.