Subject | Re: [firebird-support] Consistency Check When Backing Up |
---|---|
Author | Kurt Federspiel |
Post date | 2009-01-13T03:09:19Z |
Hi, Helen.
I ran gfix and a was able to access the DB, but the schema was pretty messed up and a fair amount of data was lost; table lost columns, and the data for those columns was missing.
I had a corruption on this machine (again), and the culprit was security.db; the DB daemon came to a halt. Since the security.db is not massaged through RAM on a backup, would you go out on a limb and say this is a disk issue, or is there something I am completely missing?
Thanks again for all your help.
Kurt.
----------------------------------------
Never underestimate the Power of Denial.
________________________________
From: Helen Borrie <helebor@...>
To: firebird-support@yahoogroups.com
Sent: Tuesday, December 9, 2008 3:48:11 PM
Subject: Re: [firebird-support] Consistency Check When Backing Up
At 09:30 AM 10/12/2008, you wrote:
Consistency check errors occur whenever the engine encounters a fatal error that it cannot explain. Hardware faults typically cause such errors.
To use your gbak -b process as an example:
gbak reads the metadata, converts it to a specialised text format and writes it out to disk. Here it is using RAM to store the metadata item and convert it. It is also working with disk space: reading and writing while doing garbage collection, reading metadata, requesting filesystem (hard disk) space from the OS for each piece of data it has to write to the backup file and then writing to that disk space.
Next, it extracts (reads) data from disk, converts and compresses it in RAM, and writes it out to the backup file.
So - if there is HDD damage - either in the database pages during the read, or in the area of disk where the backup data is being written to - then the process will get a "fatal" signal from the OS and will throw a consistency check.
If there is a fault in a RAM chip, then you have potential for the memory-intensive conversion processes to fail. Again, the engine will get a "fatal" signal and will throw a consistency check error. If you are getting a consistency check error sometimes but not every time you repeat a process, or at differing points in the process, then that's a fairly good indicator that it is faulty RAM.
HDD damage can be a lot harder to trace since it is likely to be progressive. It will be hard to assess whether gbak is encountering current data that is already physically damaged, old record versions that have been damaged, etc. etc., or whether the file-writes are bumping into damaged disk areas.
And then there are things like messed-up RAID arrays, external software such as anti-virus and file-backup tools, that jump in and lock areas of disk indiscriminately. ..databases need to be fully protected from these utilities!!
The engine will not allow a process to continue once it knows it is broken.
Actually, running a full gbak backup is a good way to confirm a disk or RAM problem, if you have had unexplained errors during database usage. You can use it in combination with the validation and repair tools in gfix to try to pinpoint what kind of errors are occurring. Just make sure that, before you start playing around with these things, you take the database *right* off line and file-copy it to somewhere safe.
./heLen
.
[Non-text portions of this message have been removed]
I ran gfix and a was able to access the DB, but the schema was pretty messed up and a fair amount of data was lost; table lost columns, and the data for those columns was missing.
I had a corruption on this machine (again), and the culprit was security.db; the DB daemon came to a halt. Since the security.db is not massaged through RAM on a backup, would you go out on a limb and say this is a disk issue, or is there something I am completely missing?
Thanks again for all your help.
Kurt.
----------------------------------------
Never underestimate the Power of Denial.
________________________________
From: Helen Borrie <helebor@...>
To: firebird-support@yahoogroups.com
Sent: Tuesday, December 9, 2008 3:48:11 PM
Subject: Re: [firebird-support] Consistency Check When Backing Up
At 09:30 AM 10/12/2008, you wrote:
>Thanks, Alan & Dmitry.Dmitry is usually correct. ;-)
>
>Can anyone tell me WHY this happens? Is Dmitry correct that it is a hardware issue?
Consistency check errors occur whenever the engine encounters a fatal error that it cannot explain. Hardware faults typically cause such errors.
To use your gbak -b process as an example:
gbak reads the metadata, converts it to a specialised text format and writes it out to disk. Here it is using RAM to store the metadata item and convert it. It is also working with disk space: reading and writing while doing garbage collection, reading metadata, requesting filesystem (hard disk) space from the OS for each piece of data it has to write to the backup file and then writing to that disk space.
Next, it extracts (reads) data from disk, converts and compresses it in RAM, and writes it out to the backup file.
So - if there is HDD damage - either in the database pages during the read, or in the area of disk where the backup data is being written to - then the process will get a "fatal" signal from the OS and will throw a consistency check.
If there is a fault in a RAM chip, then you have potential for the memory-intensive conversion processes to fail. Again, the engine will get a "fatal" signal and will throw a consistency check error. If you are getting a consistency check error sometimes but not every time you repeat a process, or at differing points in the process, then that's a fairly good indicator that it is faulty RAM.
HDD damage can be a lot harder to trace since it is likely to be progressive. It will be hard to assess whether gbak is encountering current data that is already physically damaged, old record versions that have been damaged, etc. etc., or whether the file-writes are bumping into damaged disk areas.
And then there are things like messed-up RAID arrays, external software such as anti-virus and file-backup tools, that jump in and lock areas of disk indiscriminately. ..databases need to be fully protected from these utilities!!
The engine will not allow a process to continue once it knows it is broken.
Actually, running a full gbak backup is a good way to confirm a disk or RAM problem, if you have had unexplained errors during database usage. You can use it in combination with the validation and repair tools in gfix to try to pinpoint what kind of errors are occurring. Just make sure that, before you start playing around with these things, you take the database *right* off line and file-copy it to somewhere safe.
./heLen
.
[Non-text portions of this message have been removed]