Subject Corrupt header problem
Author polydwarf820
We've been dealing with a problem that has occurred a couple of times
in the past month or so on different client machines.
For one reason or another (we haven't been able to pin it down), a
customer calls in, saying that they're data is corrupt. The way we
solved this the first time, not really understanding what we were
doing, was to copy the header block (As determined by looking at the
interbase source code) from a "good" gdb file that had a lot of data
in it (More than was in the bad file, definitely) to the bad gdb
file. This seemed to work, the customer could see all of their data,
and we went on our way.
However, another customer has called in with the same problem, which
is starting to give us concern. After trying the trick we tried
before, we were able to recover some of the data, though not all
(Some of the tables that should have in the neighborhood of 1000
records or so have none. The data is in the gdb, as I could see it
with a hex-editor).
We narrowed down the issue in the header to the transaction
references (hdr_oldest_transaction,
hdr_oldest_active,hdr_next_transaction). If we changed those to
older transaction references from an old backup of the customer's
data (They don't have a recent backup, of course), we would recover
the same amount of data, as wholesale copying of the header block of
the gdb file gave us.

When browsing through one of the tables from the recovered data in
IBConsole, we would get a few "invalid data conversion" errors, but
we could continue scrolling. Trying to delete the individual record
that seemed to be causing the problem caused a null record to be
shown as the next record (The null record wasn't there before the
attempted delete).
We attempted a validate on the table, and the standard validation
validated everything correctly the first time. When we included
record fragments in the validation we got "Error 335544344. Error
while trying to read from file. Reached end of file."
A gbak -v revealed that it didn't think there were records at all in
the tables that there should be plenty of records in, however the
gbak -v itself went without issue (I didn't bother trying to do a
grestore, since the records I want/need weren't reported in the bak)

The fact that changing the transaction id's helps seems significant,
though how it is, we're not sure. Possibly a transaction that was in
the middle of processing got stopped by a machine failure (Hard lock,
BSOD, etc), and led to corrupting the file? Stepping back the
transaction ID numbers by 1 didn't help, we're going to be trying
different values to see if anything makes a difference.
The table that's giving us the data conversion errors when browsing
it also would be linked to the root problem, I think, or at least it
seems it should be linked.

Has anyone come across anything vaguely similar, or is there some way
to get record fragments out of a gdb, other than the by-hand method,
which isn't going to happen? :)
And on a baser level, if you've seen something like this, what were
the conditions in which it happened, so we can try to keep it from
happening again?

- Jason