firebird-support - Re: [firebird-support] Careful Write: (One for Ann!)

Subject	Re: [firebird-support] Careful Write: (One for Ann!)
Author	Ann W. Harrison
Post date	2009-02-10T16:56:08Z

Leng,

>
> Currently, I am enrolled on a database class. One of the
> requirements is to report to the class a particular RDBMS.
> My goupmates and I chose Firebird. Can we seek your help
> on how "careful write" is peformed on Firebird. Any
> information will be greatly appreciated.
>

The simple answer is "by writing pages in the correct order",
but that probably doesn't help. The underlying rule for careful
write is that you must write the page pointed at before you
write the page that points at it.

Firebird uses careful write to keep the database on disk
correct at all times. Assuming that the disk subsystem
doesn't lie about the order in which it writes pages and
there are no bugs, you can crash a Firebird server at any
point and the database will restart without corruption.
Some space may be unusable for reasons we'll get into,
but the database will be correct and will include all
committed change made before the crash.

Here's an example of careful write in action. When Firebird
creates a data page, it calls a routine called fake_page (I
think) to get a page size buffer which Firebird formats as
a data page (DPG) and then writes new record versions there -
all this is in memory, and uncommitted.

To put the new page on disk, Firebird find a free page in
the database from the active a page information page (PIP).
Then it must change the state of the page on the PIP,
write the data page, then write the page number on a
pointer page (PPG) for the table, making it known as a
part of the table.

The order of page writes is PIP first, so the page is
marked as being in use and can't be allocated by some
other thread, then DPG, then PPG. If there were index
entries for the newly created records on the page, they
are written next on index pages (IDX).

All pages must be on disk before the changes are committed.

If there is a crash before the PIP is written, nothing has
changed. If the crash comes between writing the PIP and
writing the DPG and PPG, then that page becomes unavailable
until someone runs gfix, but everything else is OK. If
there is a crash after writing the DPG but before the PPG,
the situation is the same - the DPG is allocated but not
used. All the records on the page belong to transactions
that were rolled back, so there's no data loss.

If there's a crash after writing the PPG but before writing
the IDX pages, the page is part of the table, but all records
on it belong to a rolled back transaction, so they will be
garbage collected eventually.

Consider the case of an index page split. Actually, for
that, check the Firebird for Database Experts articles at
ibphoenix. They've got pretty colored pictures of index
splits and the writes they cause.

All the ordering of page writes is controlled by a dependency
graph - a structure that maintains the order of dependencies
among unwritten pages. In the case we just looked at the
IDX pages depend on the PPG which depends on the DPG which
depends on the PIP. If some other transaction makes a change
to one of those IDX pages and commits, it will force the write
of the IDX which can't happen until the PPG is written which
can't happen until the DPG is written, which can't happen
until the PIP is writing. So asking for a write of an IDX
causes these writes in this order: PIP, DPG, PPG, IDX.

So each page has its place in the dependency graph and will
cause the pages it depends on to be written before it. That
graph also shows potential loops - page A must be written
before page B which must be written before page C which must
be written before page A, resulting in an irredeemable mess.

When the dependency graph shows that the next entry will cause
a loop, Firebird forces out enough pages to break the loop
before entering the new dependency. Those write are necessary
only to make careful write possible. Falcon avoids the cost
of writing and reading a recovery log at the expense of sometimes
writing database pages that could be deferred in other schemes.

Good luck,

Ann