Subject Database corruption (new instance)
Author Adam
Hello Group,

Back in late October, I reported an issue where a customer was
experiencing regular database corruption (~3 times/week). It was
always an error in one of several indices on a very busy table.

Details here:
http://tech.groups.yahoo.com/group/firebird-support/message/97962

The solution offered there was to upgrade to 2.1. The customer was
upgraded to 2.1 in early December and had been running for over two
months without issue.

Yesterday however, one of our automated tools started logging a
failure from the database that at first glance appears to be very similar.

Environment:
Windows 2003 Server
Firebird 2.1 Classic
16 CPU Cores
FDB file ~2.5GB

From firebird.log
---
DBSERVER Mon Feb 09 11:49:07 2009

Database: PRODUCTdb

database file appears corrupt (E:\PRODUCT\DATA\PRODUCTDB.FDB)

wrong page type

page 295704 is of wrong type (expected 7, found 3)

internal gds software consistency check (error during
savepoint backout (290), file: exe.cpp line: 4034)
---

I needed to run gfix -validate, but being a live database I didn't
want to take everyone out unless absolutely necessary. I made a copy
of the database (nbackup -L, file copy, nbackup -N, nbackup -F on
copy). I then proceeded to validate the copy.

As expected, there were a few (3 from memory) index page corruptions.

From Firebird.log:
---
DBSERVER Mon Feb 09 11:57:33 2009

Database: E:\PRODUCT\ADAM\PRODUCTDB.FDB

Index 1 is corrupt on page 295473 level 0. File:
..\..\..\src\jrd\validation.cpp, line: 1537

in table SOMETABLE (215)





DBSERVER Mon Feb 09 11:57:34 2009

Database: E:\PRODUCT\ADAM\PRODUCTDB.FDB

Page 295704 wrong type (expected 7 encountered 3)





DBSERVER Mon Feb 09 11:57:34 2009

Database: E:\PRODUCT\ADAM\PRODUCTDB.FDB

Index 4 is corrupt on page 295704 level 255. File:
..\..\..\src\jrd\validation.cpp, line: 1454

in table SOMETABLEREPLICATION (264)





DBSERVER Mon Feb 09 11:57:34 2009

Database: E:\PRODUCT\ADAM\PRODUCTDB.FDB

Index 4 is corrupt on page 295704 level 255. File:
..\..\..\src\jrd\validation.cpp, line: 1468

in table SOMETABLEREPLICATION (264)





DBSERVER Mon Feb 09 11:57:38 2009

Database: E:\PRODUCT\ADAM\PRODUCTDB.FDB

Page 297835 is used but marked free





DBSERVER Mon Feb 09 11:57:38 2009

Database: E:\PRODUCT\ADAM\PRODUCTDB.FDB

Page 297895 is used but marked free

---

From gfix, I identified the table, dropped and recreated the foreign
key constraints, and the issue is "resolved" (ie, the automated tool
succeeds now).

Some questions:
* Clearly the upgrade to 2.1 resolved a lot of these index issues. Are
there still known issues with indices on extremely busy tables? (this
table flags PKs requiring replication to hundreds of remote devices,
so it is not uncommon to manipulate hundreds of records per second in
this table).
* I observed something unusual with nbackup. After running -N, the
delta file was left. Obviously it is left during the merge, but using
process explorer I could see that there was no fb_inet_server.exe
instances holding a handle. Any ideas?

Thanks in advance

Adam