Subject | Database corruption (new instance) |
---|---|
Author | Adam |
Post date | 2009-02-09T22:51:41Z |
Hello Group,
Back in late October, I reported an issue where a customer was
experiencing regular database corruption (~3 times/week). It was
always an error in one of several indices on a very busy table.
Details here:
http://tech.groups.yahoo.com/group/firebird-support/message/97962
The solution offered there was to upgrade to 2.1. The customer was
upgraded to 2.1 in early December and had been running for over two
months without issue.
Yesterday however, one of our automated tools started logging a
failure from the database that at first glance appears to be very similar.
Environment:
Windows 2003 Server
Firebird 2.1 Classic
16 CPU Cores
FDB file ~2.5GB
From firebird.log
---
DBSERVER Mon Feb 09 11:49:07 2009
Database: PRODUCTdb
database file appears corrupt (E:\PRODUCT\DATA\PRODUCTDB.FDB)
wrong page type
page 295704 is of wrong type (expected 7, found 3)
internal gds software consistency check (error during
savepoint backout (290), file: exe.cpp line: 4034)
---
I needed to run gfix -validate, but being a live database I didn't
want to take everyone out unless absolutely necessary. I made a copy
of the database (nbackup -L, file copy, nbackup -N, nbackup -F on
copy). I then proceeded to validate the copy.
As expected, there were a few (3 from memory) index page corruptions.
From Firebird.log:
---
DBSERVER Mon Feb 09 11:57:33 2009
Database: E:\PRODUCT\ADAM\PRODUCTDB.FDB
Index 1 is corrupt on page 295473 level 0. File:
..\..\..\src\jrd\validation.cpp, line: 1537
in table SOMETABLE (215)
DBSERVER Mon Feb 09 11:57:34 2009
Database: E:\PRODUCT\ADAM\PRODUCTDB.FDB
Page 295704 wrong type (expected 7 encountered 3)
DBSERVER Mon Feb 09 11:57:34 2009
Database: E:\PRODUCT\ADAM\PRODUCTDB.FDB
Index 4 is corrupt on page 295704 level 255. File:
..\..\..\src\jrd\validation.cpp, line: 1454
in table SOMETABLEREPLICATION (264)
DBSERVER Mon Feb 09 11:57:34 2009
Database: E:\PRODUCT\ADAM\PRODUCTDB.FDB
Index 4 is corrupt on page 295704 level 255. File:
..\..\..\src\jrd\validation.cpp, line: 1468
in table SOMETABLEREPLICATION (264)
DBSERVER Mon Feb 09 11:57:38 2009
Database: E:\PRODUCT\ADAM\PRODUCTDB.FDB
Page 297835 is used but marked free
DBSERVER Mon Feb 09 11:57:38 2009
Database: E:\PRODUCT\ADAM\PRODUCTDB.FDB
Page 297895 is used but marked free
---
From gfix, I identified the table, dropped and recreated the foreign
key constraints, and the issue is "resolved" (ie, the automated tool
succeeds now).
Some questions:
* Clearly the upgrade to 2.1 resolved a lot of these index issues. Are
there still known issues with indices on extremely busy tables? (this
table flags PKs requiring replication to hundreds of remote devices,
so it is not uncommon to manipulate hundreds of records per second in
this table).
* I observed something unusual with nbackup. After running -N, the
delta file was left. Obviously it is left during the merge, but using
process explorer I could see that there was no fb_inet_server.exe
instances holding a handle. Any ideas?
Thanks in advance
Adam
Back in late October, I reported an issue where a customer was
experiencing regular database corruption (~3 times/week). It was
always an error in one of several indices on a very busy table.
Details here:
http://tech.groups.yahoo.com/group/firebird-support/message/97962
The solution offered there was to upgrade to 2.1. The customer was
upgraded to 2.1 in early December and had been running for over two
months without issue.
Yesterday however, one of our automated tools started logging a
failure from the database that at first glance appears to be very similar.
Environment:
Windows 2003 Server
Firebird 2.1 Classic
16 CPU Cores
FDB file ~2.5GB
From firebird.log
---
DBSERVER Mon Feb 09 11:49:07 2009
Database: PRODUCTdb
database file appears corrupt (E:\PRODUCT\DATA\PRODUCTDB.FDB)
wrong page type
page 295704 is of wrong type (expected 7, found 3)
internal gds software consistency check (error during
savepoint backout (290), file: exe.cpp line: 4034)
---
I needed to run gfix -validate, but being a live database I didn't
want to take everyone out unless absolutely necessary. I made a copy
of the database (nbackup -L, file copy, nbackup -N, nbackup -F on
copy). I then proceeded to validate the copy.
As expected, there were a few (3 from memory) index page corruptions.
From Firebird.log:
---
DBSERVER Mon Feb 09 11:57:33 2009
Database: E:\PRODUCT\ADAM\PRODUCTDB.FDB
Index 1 is corrupt on page 295473 level 0. File:
..\..\..\src\jrd\validation.cpp, line: 1537
in table SOMETABLE (215)
DBSERVER Mon Feb 09 11:57:34 2009
Database: E:\PRODUCT\ADAM\PRODUCTDB.FDB
Page 295704 wrong type (expected 7 encountered 3)
DBSERVER Mon Feb 09 11:57:34 2009
Database: E:\PRODUCT\ADAM\PRODUCTDB.FDB
Index 4 is corrupt on page 295704 level 255. File:
..\..\..\src\jrd\validation.cpp, line: 1454
in table SOMETABLEREPLICATION (264)
DBSERVER Mon Feb 09 11:57:34 2009
Database: E:\PRODUCT\ADAM\PRODUCTDB.FDB
Index 4 is corrupt on page 295704 level 255. File:
..\..\..\src\jrd\validation.cpp, line: 1468
in table SOMETABLEREPLICATION (264)
DBSERVER Mon Feb 09 11:57:38 2009
Database: E:\PRODUCT\ADAM\PRODUCTDB.FDB
Page 297835 is used but marked free
DBSERVER Mon Feb 09 11:57:38 2009
Database: E:\PRODUCT\ADAM\PRODUCTDB.FDB
Page 297895 is used but marked free
---
From gfix, I identified the table, dropped and recreated the foreign
key constraints, and the issue is "resolved" (ie, the automated tool
succeeds now).
Some questions:
* Clearly the upgrade to 2.1 resolved a lot of these index issues. Are
there still known issues with indices on extremely busy tables? (this
table flags PKs requiring replication to hundreds of remote devices,
so it is not uncommon to manipulate hundreds of records per second in
this table).
* I observed something unusual with nbackup. After running -N, the
delta file was left. Obviously it is left during the merge, but using
process explorer I could see that there was no fb_inet_server.exe
instances holding a handle. Any ideas?
Thanks in advance
Adam