firebird-support - Re: [firebird-support] Database corruption

Subject	Re: [firebird-support] Database corruption
Author	Helen Borrie
Post date	2004-09-06T01:16:32Z

At 07:35 PM 5/09/2004 -0400, you wrote:

>Using Firebird v1.5.1.4417 in classic mode (fb_inet_server.exe) with forced
>writes enabled, on a dual Xeon WinXP Server and raid 5 array. The database
>is a slightly larger than 14GB.
>
>Each night, we have an automated backup/restore/verify script that is run
>against the database.
>
>This morning, I found that the backup had failed with the following error:
>
>database file appears corrupt ()
>-wrong page type
>-page 1653541 is of wrong type (expected 7, found 5)
>
>
>I ran gfix -v -f on the file, and received a message regarding a number of
>index and data pages with problems. I then ran gfix -m -i, and received the
>same message. Running gfix -v -f still gave the same errors.
>
>I then tried to run gbak -g -ig, and it seems to have worked. I am in the
>process of running the restore to see if the above worked.
>
>I have heard that it is not possible to corrupt a Firebird database with
>forced writes on, but I have here an example of it occurring.

That's a myth. it only avoids one source of corruption. I can tell you
six other ways to corrupt a Firebird database. That's without going into
the possibilities in the cases where old databases have been hooked up to
new server versions... (dialect differences, ODS differences...)

> We also had a corrupt database in Jan or Feb of this year, though that
> was with a beta of 1.5 super server on older server hardware (though
> still forced writes and raid 5).

Have you scanned the disks since then?

>With a database of this size, I can't afford to run
>gfix/gfix/gfix/backup/restore/gfix, as that is at least an eight hour
>process. Since the earlier corruption this year we have maintained the
>restored databases from the previous seven days just in case we run into a
>catastrophic problem that causes us to 'roll back' to yesterday's database.
>Either case (eight hours to repair or a full day of data lost) is
>unacceptable.

Grab the 1.5 Quick Start Guide from the documentation area of the main
Firebird website to look at other ways the environment or your users'
practices might be causing the corruption.

Classic on Windows is even vulnerable to the path string bug, so beware of
users who install third-party Admin tools and use them to mess around with
data and metadata. The only sure way to get past this is to configure
DatabaseAccess to None and enforce the use of aliases. But, that said, if
you have a gung-ho user on site with owner privileges and a box of tools,
s/he is likely to find another way to break things.

Some kinds of corruption can't be cured with gfix.

>There's gotta be a better way. Please share your strategies. We are
>looking into replication as well, but haven't figured out the logistics of
>processing bulk imports of over 100 files and consitsting of 500k-1M rows on
>two servers.

Sure, replication is cool - it will find (and, hopefully, reject) data that
have been corrupted en route to the database. If data was good when it was
replicated then data damaged by trauma on the active database's disk space
has a good chance of being recovered, provided the damaged disk space
wasn't touched by a subsequent update. Fully functional replication
systems can work in system idle time as a background process, so it's not
essential to run with a "waterfall" replication model.

But the primary objective in your current situation is to find out why and
where the corruption is occurring and close the hole.

./heLen