Subject Re: [Firebird-Architect] Blocking garbage collection, read committed transactions, and blobs
Author Ann W. Harrison
Ann W. Harrison wrote:
> ...
> about reducing the impact of read-committed transactions on garbage
> collection.... Someone clever

Actually Alexander Klenin

> suggested that we keep a separate "oldest active blob" number....

A summary of the previous message is

1) for consistency, the engine can not garbage collect records stored by
any transaction that was active at the start of the oldest running
concurrency transaction.

2) garbage collection generally does not affect records visible by
read-committed transactions, so those transactions should not block
garbage collection at all.

3) The exception is that garbage collecting blob data can cause errors
for a read-committed transactions in this case:

a) transaction A reads a committed record version, saving the blob
id. If the blob is large, transaction A can start to read it.
b) transaction B deletes or modifies that record. The modification
changes the contents of the blob.
c) transaction B commits
d) transaction C garbage collects the record, removing the old
versions and the blob
e) transaction D stores a record, record version, or blob and
reuses the space held the blob.
f) transaction A attempts to read or continue to read the blob
using the old blob id.

At that point transaction A gets a blob not found error, a wrong type
page error, a database corruption error, or data from the wrong blob.

A concurrency transaction needs to block garbage collection at the
oldest transaction active when it started, but a read-committed
transaction only needs to prevent the garbage collection of blobs that
were committed when it started. Records without blobs can be garbage
collected without affecting read-committed transactions.

In fact, some record versions with blobs can be garbage collected
without affecting read-committed transactions. When a record containing
a blob is stored, the blob is stored first, then the record is stored
with the blob_id in the blob field. If that record is updated without
changing the blob, the new version maintains the same blob id, and there
is only one copy of the blob. In that case, garbage collecting the old
record version won't affect a read-committed transaction that holds the
blob id from the old record.

During garbage collection, the engine builds a list of blob_id that
should not be garbage collected because they are used by newer record
versions. That is called the "staying" list. That list can be used to
decide whether garbage collecting a record version that contains a blob
id will create problems for read-committed transactions. If all blob
ids in the record version are in the staying list, then the record
version can be garbage collected. Otherwise, it can't.

Assuming there were a way to get the information, lets assume the engine
has two limits: GC limit which is equivalent to the current oldest
active (or oldest snapshot - that's a different conversation), and BLOB
limit which is the oldest blob that must be preserved for read-committed
transactions. The rules for garbage collection would be

A) no record version above than the GC limit can be garbage
B) records versions below the GC limit and above the BLOB
limit can be garbage collected as long as they contain no blobs
or all blobs they do contain are on the staying list.
C) any record version older than the BLOB limit can be garbage

In a running system, the limit of garbage collection is maintained in
the lock table. The database block and database header contain values
for the limit, but the information from which those values are computed
is in the lock table. It is kept there because, especially for classic
processes, the lock table is the cheap place to keep volatile shared data.

The limit of garbage collection is currently set this way: when a
transaction (regardless of type) starts, it puts the transaction id of
the oldest transaction currently running in the lock data space in the
lock it takes on its own transaction id. Whenever a transaction starts,
the engine surveys the lock table, and sets the current garbage
collection limit to the lowest value it finds in a transaction lock block.

If the engine divided its survey into two parts - the lowest transaction
id in lock blocks that belong to concurrency transactions and the lowest
transaction id in lock blocks that belong to read-committed
transactions, it could use the first as the limit on all garbage
collection (GC limit) and the second as the limit on blob garbage
collection (BLOB Limit).

However there is no practical way to determine from the lock whether it
represents a read-committed transaction or a concurrency transaction.
Nor is there any reasonable way to add that information without
seriously breaking the boundary layers between the lock manager and the
engine. Since I think that tying into the open source distributed lock
manager (OSDLM) is going to be important, and since we have a good
chance of doing that because our lock management is very similar to the
VMS distributed lock manager which the OSDLM emulates, I don't want to
suggest hacks that would complicate that integration.

So, suppose we add another lock series, just to hold the blob limit
value for read-committed transactions. Those transactions would put a
minus one in the lock data for their transaction lock and take out
another lock in the new series. That lock would contain the transaction
id of the oldest transaction currently running. When computing garbage
collection limits, the engine queries both lock series, taking the GC
Limit from the transaction locks and the BLOB Limit from the new series.

When a read-committed transaction does a commit-retaining (or rollback
retaining), it creates a new transaction lock block with a minus one for
lock data, but retains the lock it has in the new series. Thus blob-ids
would be reliable across a commit-retaining, which makes sense to me,
since the general description is that a commit-retaining "retains the
context" of the transaction.

We eliminate the current "pre-committed" status for read-only read
committed transactions - it causes "blob not found" errors.

This technique makes blob retrieval reliable for all read committed
transactions (fixing a bug) and keeps all read committed transactions
from block most garbage collections. In short, it's better than what
InterBase is doing, and it works.

At least, I think it works.