Subject | Database corruption |
---|---|
Author | Bob Murdoch |
Post date | 2004-09-05T23:35:06Z |
Using Firebird v1.5.1.4417 in classic mode (fb_inet_server.exe) with forced
writes enabled, on a dual Xeon WinXP Server and raid 5 array. The database
is a slightly larger than 14GB.
Each night, we have an automated backup/restore/verify script that is run
against the database.
This morning, I found that the backup had failed with the following error:
database file appears corrupt ()
-wrong page type
-page 1653541 is of wrong type (expected 7, found 5)
I ran gfix -v -f on the file, and received a message regarding a number of
index and data pages with problems. I then ran gfix -m -i, and received the
same message. Running gfix -v -f still gave the same errors.
I then tried to run gbak -g -ig, and it seems to have worked. I am in the
process of running the restore to see if the above worked.
I have heard that it is not possible to corrupt a Firebird database with
forced writes on, but I have here an example of it occurring. We also had a
corrupt database in Jan or Feb of this year, though that was with a beta of
1.5 super server on older server hardware (though still forced writes and
raid 5).
With a database of this size, I can't afford to run
gfix/gfix/gfix/backup/restore/gfix, as that is at least an eight hour
process. Since the earlier corruption this year we have maintained the
restored databases from the previous seven days just in case we run into a
catastrophic problem that causes us to 'roll back' to yesterday's database.
Either case (eight hours to repair or a full day of data lost) is
unacceptable.
There's gotta be a better way. Please share your strategies. We are
looking into replication as well, but haven't figured out the logistics of
processing bulk imports of over 100 files and consitsting of 500k-1M rows on
two servers.
Bob M..
writes enabled, on a dual Xeon WinXP Server and raid 5 array. The database
is a slightly larger than 14GB.
Each night, we have an automated backup/restore/verify script that is run
against the database.
This morning, I found that the backup had failed with the following error:
database file appears corrupt ()
-wrong page type
-page 1653541 is of wrong type (expected 7, found 5)
I ran gfix -v -f on the file, and received a message regarding a number of
index and data pages with problems. I then ran gfix -m -i, and received the
same message. Running gfix -v -f still gave the same errors.
I then tried to run gbak -g -ig, and it seems to have worked. I am in the
process of running the restore to see if the above worked.
I have heard that it is not possible to corrupt a Firebird database with
forced writes on, but I have here an example of it occurring. We also had a
corrupt database in Jan or Feb of this year, though that was with a beta of
1.5 super server on older server hardware (though still forced writes and
raid 5).
With a database of this size, I can't afford to run
gfix/gfix/gfix/backup/restore/gfix, as that is at least an eight hour
process. Since the earlier corruption this year we have maintained the
restored databases from the previous seven days just in case we run into a
catastrophic problem that causes us to 'roll back' to yesterday's database.
Either case (eight hours to repair or a full day of data lost) is
unacceptable.
There's gotta be a better way. Please share your strategies. We are
looking into replication as well, but haven't figured out the logistics of
processing bulk imports of over 100 files and consitsting of 500k-1M rows on
two servers.
Bob M..