firebird-support - Re: [firebird-support] Database corruption

Subject	Re: [firebird-support] Database corruption
Author	Ann W. Harrison
Post date	2005-01-26T17:11:08Z

Jonathan Neve wrote:

>>
> A while back I brought up the subject of database corruption, and I was
> basically told that IB makes it theoretically impossible (due to careful
> writes) for a database to get corrupted, unless of course it's a matter
> of file system copy of the GDB, or something like that.

Which leads to the old question and answer" Q. What is the difference
between theory and practice? A. In theory, there is none.

However, both InterBase 6 and Firebird 1.0x have gone a long way toward
fixing corruptions that were common in InterBase 5.6. The most common
corruption for InterBase 5.6 was any database that got to be longer than
4GB. The insert pointer for the database file wrapped around and
InterBase wrote new pages over the database header and the oldest
database pages. That was bad.

Firebird 1.0 fixed two other significant problems that were common
causes of corruption on Windows systems. The default on Windows changed
to forced write. Unlike *nix systems, Windows does not flush it's page
cache until the file closes. Unless forced write is enabled, no changes
to a Windows database go to disk until the server shuts down. That's a
disaster waiting to happen. The SuperServer on Windows formerly opened
the database for shared write. It now uses exclusive write / shared
read mode. That keeps the server from opening a single file twice under
two different names.

>
> Anyway, I have again got a DB corruption on a customer's PC, and so I'd
> like to figure out what it's due to: is it a bug, or is there something
> I can do to avoid it happening?
>
> I'm currently using IB 6.0.1.0, on Win 98.

And you almost certainly suffered from having committed changes that
were not written to disk because forced write is not the default for
InterBase 6.

>
> The machine was then shutdown (abnormally I presume, because there's no
> trace of any clean shutdown in the log), and restarted (as evidenced by
> the fact that at 9:00, the gardian was restarted):

As Alex said, the reports you described don't indicate terrible
corruption and you can probably just backup and restore the database -
changing the state of forced writes - and continue.

Let me explain a bit about some of the "horrible" looking errors,
particularly the "orphaned" errors. Even with forced write enabled, a
hard shutdown of a firebird server will sometimes result in "orphans".
An orphan is a piece of storage that has been removed from a free
space list, but not used.

The goal of careful write is to have the database consistent after every
page write. To do so, the page write that indicates that storage is
used must occur before the write that actually uses it.

For example, a page must be removed from the free space page list before
it can be assigned to a table or index. When a page is released, it
must be removed from the index or table before it can be marked as free
on the free space list. Orphaned pages are pages that were in the
process of being allocated or released when the server crashed hard.
They're wasted space - until a backup/restore - but they don't indicate
any damage to data structures.

In another example, when a back version of a record is released, the
pointer to it is cleared and that page written out before the page index
- the indicator that space on a data page is used - is cleared. If
the server crashes after the pointer is cleared and written and before
the page index is cleared and written, you get an orphaned back version.

The goal of the mend operation (gfix, or whatever) is to isolate damage
so the database can be backed up. It deliberately does not put space
that it has released back into the free space area for pages or space on
page. The assumption is that something went horribly wrong and the goal
is to clean up enough that the metadata and data can be extracted and
the database recreated.

Regards,

Ann