|Subject||Re: [firebird-support] Firebird 3 - Auto Garbage collection with Sweep interval = 0|
Adding just a bit more detail to the description of garbage collection in Firebird.
For those who know little about Firebird, when a record is updated, Firebird
does not replace the old version with the new. Instead, it creates a new version
and chains the old version behind the new. Each record version is marked with
the identity of the transaction that created it. Concurrent transactions can see
different versions of the same record. Each transaction sees the newest version
that was committed when the reading transaction started (in concurrency isolation)
or the newest committed version (in read committed isolation).
The challenge is to remove versions too old for any active transaction to read
without spending all day doing it.. The answer is garbage collection - which can
be "cooperative", "background", or "combined." Combined (as you might guess)
decides on a case by case basis whether to do cooperative or background
clean-up. Classic and SuperClassic modes only support cooperative garbage
Although it is possible to create a connection that does not remove unlovable
ancient record versions, generally, any time a connection touches a record with
back versions too old to be seen by any active transaction, it will cause those old
versions to be removed. (The exception is the mode gbak uses to create a backup
with the -g switch, when the intention is to restore the backup as the new working
copy of the database. Cleaning a database you plan to discard is a waste of time.)
When a normal transaction reads a record, it checks for back versions of the
record that were created by a transaction that no running transaction cares about.
Specifically, it checks for a version, newer that the next oldest, that every running
transaction can read. The older versions are a pure waste of space. If it tries to
read a record marked as deleted, it determines whether the deletion is visible to
all active transactions and causes it to be deleted.
Cooperative garbage collection.
If the system is using cooperative garbage collection, the active transaction
breaks the chain of back versions after the version that all running transactions
can read, releases the space used by the old versions to the system for reuse,
and cleans up any index entries that reference that version.
If there are few back versions to remove, they are likely to be on the same
page with the record version that the transaction read, so the I/O overhead is small.
Background garbage collection.
If an unlucky transaction happens to read a record that has many back versions,
it has to do a lot of work that it wasn't planning on doing. In SuperServer configuration,
Firebird can run a background thread that cleans up old versions. When a user
transaction runs into a record that needs garbage collection, it puts that record's
identifier on a list for the background thread to clean up. When the background
thread cleans up a record, it cleans up all the records on that page.
That sounds like a big win, but on a busy system the background garbage thread
can fall behind. Think of a strike by sanitation workers.
Some applications tend not to revisit deleted records. Well, duh... why would you?
As a result, a page full of deleted stubs and their back versions may never be garbage
collected either cooperatively or in background. A full table scan, either as part of a
sweep or backup or programmatically will cause deleted stubs to be removed.