Subject Re: [firebird-support] Database corruption
Author Ann W. Harrison
innoy1k wrote:
>
> 1) gbak was run from IBConsole, identified the corrupted table_A. It
> was believed that the corruption was on the Blob field.

Was there some specific hint that that made you think the problem was
with the blob?

> 2) A program was written to repair the database by removing the
> corrupted records.

How did you do that? As a general thing, you can't delete a corrupt
record because the delete needs to read enough of the data to release
the record, its blobs, and any back versions.

> 3) ran gfix and gbak to the repaired database on a couple platforms:
> Win2000 and WinServer2003SE. Also ran gbak from IBConsole, all runs
> were successful.

OK. At that point, everything was working.

> 4) sent the repaired database back to client's server
> (WinServer2003SE), ran gbak from IBConsole, and got errors straight
> away on table_A. The error is: Database file appears corrupt(); bad
> checksum; checksum error on database page 54805; gds_$get_segment failed.

OK. How did you send the database back? How was it put on the client's
server? When you were running gbak, was it trying to backup or restore
the database?

The checksum is validated immediately after a page is read from disk.
Checksums were abandoned in InterBase 4 and replaced with a simple
signature, 12345, that marks the page as plausible. The page read
returned something implausible for page 54805... (what size pages are
you using?) Gds_$get_segment is blob call - blobs were originally
called segmented strings (by a pompous bunch of up-tight marketing
people who considered the word "blob" vulgar). gds_$get_segment reads
the next piece of a blob. If it failed because of a checksum error,
then the blob is large (multi-page) and page 54805 should be a blob page.

> 5) reinstalled FB1.5 and IBConsole, ran gbak on the repaired database,
> but failed again.

OK.

> 6) installed FB1.5 and IBConsole on another platform (WinXP) of this
> client's office, ran gbak on the repaired database, the run was
> successful.

Ah ha. Where was the database? Still on the original server?

> 7) scanned the hard drive and its mirror disk on the server, it seen
> ok, changed a couple controllers, and disable mirror disk, then ran
> the repaired database, but failed again.
>
> In short, the checksum corruption is happening instantly. My questions
> are:
>
> Do you think this is a hardware problem?

I always think unexpected problems are hardware problems. I try not so
say so, because I'm almost always wrong.

> Why no other corruption in the same machine?

Dunno.

> What is gds_$get_segment?

As above.

Here's what I would do. One variable is the use of the services API
through IBConsole. As you know, IBConsole is a Borland produce and the
fact that it works at all with Firebird annoys them a lot - so try
running the Firebird gbak utility, both on the server and on a separate
client system. There's a small but non-negligible chance that you've
got version skew in the message files and the messages you're seeing
aren't what the server intends to send. Running the gbak from a clean
1.5 installation should avoid that problem.

If gbak consistently reports an error on databases on the particular
server system and you can copy the same file to other servers and it
runs OK, then maybe, just maybe, you need an exorcist for the server.

Regards,


Ann