Subject | Re: [firebird-support] Re: End user repair utility - assuming the worst-case. |
---|---|
Author | Ann W. Harrison |
Post date | 2011-01-18T16:41:02Z |
On 1/17/2011 5:26 PM, homerjones1941 wrote:
corrupts databases as long as forced write is enabled - and works
through the operating system, I/O subsystem, disk controller,
etc. Anything that says "yes boss, that's on disk" when it isn't
is a threat to database integrity. Ten years ago, InterBase
databases on Windows had a couple of bugs that proved quite
lucrative to the database repair side of the world, but they
were fixed before the first version of Firebird shipped.
And yes, there have been a few problems fixed recently that
affect indexes under heavy contention - expected page type 7
encountered 5. Those can be fixed by dropping and recreating
the index. There are also situations that gfix reports as
index errors (errors on index pages) which are in fact a
sub-optimal index that will give correct results in queries.
(There's a long explanation of that in the archives ... basically,
you can't write two things first.) But on the whole, Firebird
is more resilient than many desktop systems.
Second, when gfix finds a corrupted page, it simply removes it
from the table. When it finds a corrupted record, it deletes
it in the most primitive possible way. Gfix makes no effort
to find references to the corrupt record or records on the
corrupted page in indexes. In other words, after gfix has
"mended" a database, the indexes may contain pointers to records
that don't exist. The alternatives are to rebuild all indexes
or backup the database and restore it with gbak. The latter
is more reliable and does other cleanup that should improve
performance.
Here's my formula for database self-help.
1) copy the database
2) run gfix -v -n to determine the extent of the problem.
Keep the results.
2) attempt a gbak -v -b -g <database> <backup>
If that works, restore the database. Gbak doesn't read
user indexes, so it won't stop because of index errors.
The -g switch keeps gbak from trying to remove old versions
of records, so if it ignores corrupted back versions.
3) if that fails, run gfix -v -m to try to remove areas
of corruption.
4) backup and restore the result, and compare it with what
you expected to find.
5) if that fails, try using a data pump to move data to a
clean database. If there are remaining corruptions in a
particular table, you can often figure out which primary
key values lead to bad records. Pump everything lower than
the corrupt value, then everything higher. Hope you can
find the missing records in a backup.
Cheers,
Ann
>>A couple of points. First, Firebird is quite stable and rarely
> In my mind, backup and restore is a different matter than doing
> a repair (via gfix). If I understand correctly, implementing a
> fix via backup and restore is only done if gfix is unable to
> effect a repair.
corrupts databases as long as forced write is enabled - and works
through the operating system, I/O subsystem, disk controller,
etc. Anything that says "yes boss, that's on disk" when it isn't
is a threat to database integrity. Ten years ago, InterBase
databases on Windows had a couple of bugs that proved quite
lucrative to the database repair side of the world, but they
were fixed before the first version of Firebird shipped.
And yes, there have been a few problems fixed recently that
affect indexes under heavy contention - expected page type 7
encountered 5. Those can be fixed by dropping and recreating
the index. There are also situations that gfix reports as
index errors (errors on index pages) which are in fact a
sub-optimal index that will give correct results in queries.
(There's a long explanation of that in the archives ... basically,
you can't write two things first.) But on the whole, Firebird
is more resilient than many desktop systems.
Second, when gfix finds a corrupted page, it simply removes it
from the table. When it finds a corrupted record, it deletes
it in the most primitive possible way. Gfix makes no effort
to find references to the corrupt record or records on the
corrupted page in indexes. In other words, after gfix has
"mended" a database, the indexes may contain pointers to records
that don't exist. The alternatives are to rebuild all indexes
or backup the database and restore it with gbak. The latter
is more reliable and does other cleanup that should improve
performance.
Here's my formula for database self-help.
1) copy the database
2) run gfix -v -n to determine the extent of the problem.
Keep the results.
2) attempt a gbak -v -b -g <database> <backup>
If that works, restore the database. Gbak doesn't read
user indexes, so it won't stop because of index errors.
The -g switch keeps gbak from trying to remove old versions
of records, so if it ignores corrupted back versions.
3) if that fails, run gfix -v -m to try to remove areas
of corruption.
4) backup and restore the result, and compare it with what
you expected to find.
5) if that fails, try using a data pump to move data to a
clean database. If there are remaining corruptions in a
particular table, you can often figure out which primary
key values lead to bad records. Pump everything lower than
the corrupt value, then everything higher. Hope you can
find the missing records in a backup.
Cheers,
Ann