Subject | Database corruption - weird behavior (IMHO) after back-up/restore. Possible hardware problem |
---|---|
Author | Alexandre Benson Smith |
Post date | 2009-01-07T03:55:26Z |
Hi !
/opt/firebird/bin/fbserver -z
Firebird TCP/IP server version LI-V2.0.0.12748 Firebird 2.0 (SS
architecture)
uname -a
Linux cpack-01-s 2.6.15-1.2054_FC5smp #1 SMP Tue Mar 14 16:05:46 EST
2006 i686 i686 i386 GNU/Linux
For the second time I had a corruption with FB, the first one was due to
a disk failure, this one I had not verified yet, but I suppose due to
RAM (OS reported problem earlier during the day)
I have no practical experience with corrupted databases, FB is stable
enough to does not make me work on this kind of trouble regularly ;)
So I need your advice...
What I did:
1.) Stop FB service
2.) Renamed original database
3.) Copy the renamed file to the original name to work on it
4.) Tried gbak with -g without success, sorry did not wrote down the
error, but could do it again if it's needed (I don't think so)
5.) gfix -validate
6.) gfix -mend
Results from firebird.log:
cpack-01-s (Server) Tue Jan 6 23:26:53 2009
Database: /home/bd/odin.fdb
internal gds software consistency check (wrong record length
(183), file: vio.cpp line: 1090)
cpack-01-s (Server) Tue Jan 6 23:35:31 2009
Database: /home/bd/odin.fdb
Index 3 is corrupt on page 61191 level 0. File:
../src/jrd/validation.cpp, line: 1549
in table NOTAFISCALENDERECO (184)
cpack-01-s (Server) Tue Jan 6 23:37:13 2009
Database: /home/bd/odin.fdb
Relation has 4 orphan backversions (0 in use) in table CLIENTE (139)
cpack-01-s (Server) Tue Jan 6 23:37:23 2009
Database: /home/bd/odin.fdb
Relation has 9 orphan backversions (0 in use) in table LOTE (175)
cpack-01-s (Server) Tue Jan 6 23:37:42 2009
Database: /home/bd/odin.fdb
Relation has 3 orphan backversions (0 in use) in table
NOTAFISCAL (183)
cpack-01-s (Server) Tue Jan 6 23:37:49 2009
Database: /home/bd/odin.fdb
Record 663384 is wrong length in table NOTAFISCALITEM (188)
cpack-01-s (Server) Tue Jan 6 23:38:11 2009
Database: /home/bd/odin.fdb
Relation has 322 orphan backversions (0 in use) in table
PEDIDOVENDA (201)
cpack-01-s (Server) Tue Jan 6 23:38:26 2009
Database: /home/bd/odin.fdb
Relation has 35 orphan backversions (0 in use) in table
PEDIDOVENDAITEM (203)
cpack-01-s (Server) Tue Jan 6 23:38:33 2009
Database: /home/bd/odin.fdb
Relation has 121 orphan backversions (0 in use) in table PRODUTO
(207)
cpack-01-s (Server) Tue Jan 6 23:39:01 2009
Database: /home/bd/odin.fdb
Record 344286 is wrong length in table SYNC_ACTIONDONE (273)
cpack-01-s (Server) Tue Jan 6 23:39:01 2009
Database: /home/bd/odin.fdb
Chain for record 344286 is broken in table SYNC_ACTIONDONE (273)
cpack-01-s (Server) Tue Jan 6 23:39:03 2009
Database: /home/bd/odin.fdb
Relation has -1 orphan backversions (1 in use) in table
SYNC_ACTIONDONE (273)
cpack-01-s (Server) Tue Jan 6 23:39:03 2009
Database: /home/bd/odin.fdb
Page 74700 is an orphan
7.) Did a back-up using gbak successfully
8.) Restored to another file, everything was going well, but on the
index creation phase I got error, looking at firebird.log I got this:
cpack-01-s (Server) Tue Jan 6 23:53:00 2009
Database: /home/bd/odinteste.fdb
internal gds software consistency check (wrong record length
(183), file: vio.cpp line: 1090)
I would expected errors on the restore process for FK violations, null
violations, etc. but not on the data itself, if I could read the
original database, and write on the back-up file I would expect that the
data is ok. I am not talking about referential integrity, or data
consistency to declared table rules, but to read the data only. Index
creation would read what it just write to the disk... Didn't expected a
problem here...
9.) Restored using gbak's -i option, so the indices are disabled,
successfully
10.) Droped 2 tables that had problems on the restore process that
currently does not holds useful information (Sync_Action and
Sync_ActionDone)
11.) Run gbak again on that database without indices, everything seems ok
12.) Restore that back-up to another file, successfully.
gbak: restoring privilege for user SYSDBA
gbak:creating indexes
gbak: committing metadata
gbak:finishing, closing, and going home
As you can see there is no index activation for the restored database,
So far so good...
At this point I think I have "all" the data in place, and just need to
re-activate the indices. In fact I will create a clean database and pump
over the data from this one. But, when I started to play with the
database to see if everything is ok I got some errors.
Database: /home/bd/odinteste_sem_indice.fdb, User: sysdba
SQL> select count(*) from cliente;
COUNT
============
Statement failed, SQLCODE = -902
internal gds software consistency check (decompression overran buffer
(179), file: sqz.cpp line: 223)
My Question is:
If I could read the data to create the back-up using gbak successfully
and restore the data successfully too, why can't I read it from this
freshly created database ?
Database: /home/bd/odinteste.fdb, User: sysdba
SQL> select count(*) from cliente;
COUNT
============
22021
again to make it clear:
odinteste.fdb was created using:
gbak backup_from_corrupted_database_after_gfix.fbk odinteste.fdb -c
-user sysdba -password masterkey -v -i
and I can read the data from it.
(without -i option I could not restore due to problems on a table that I
droped before the next step)
Then I created a back-up file from this file that I could do the select
gbak odinteste.fdb odinteste_sem_indice.fbk -user sysdba -password
masterkey -v
then I created another database to test the restore
gbak odinteste_sem_indice.fbk odinteste_sem_indice.fdb -c -user sysdba
-password masterkey -v
If I try to read from this new database I got problems as I showed above.
I think I had a hardware problem here (my costumer reported errors by
the OS that "fixed" after a reboot)
Now I am accessing it trough a ssh session, tomorrow I will be there and
will try to restore on my notebook to see if the problems disappears.
Any hints, comments, advices would be very welcome.
TIA
see you !
--
Alexandre Benson Smith
Development
THOR Software e Comercial Ltda
Santo Andre - Sao Paulo - Brazil
www.thorsoftware.com.br
/opt/firebird/bin/fbserver -z
Firebird TCP/IP server version LI-V2.0.0.12748 Firebird 2.0 (SS
architecture)
uname -a
Linux cpack-01-s 2.6.15-1.2054_FC5smp #1 SMP Tue Mar 14 16:05:46 EST
2006 i686 i686 i386 GNU/Linux
For the second time I had a corruption with FB, the first one was due to
a disk failure, this one I had not verified yet, but I suppose due to
RAM (OS reported problem earlier during the day)
I have no practical experience with corrupted databases, FB is stable
enough to does not make me work on this kind of trouble regularly ;)
So I need your advice...
What I did:
1.) Stop FB service
2.) Renamed original database
3.) Copy the renamed file to the original name to work on it
4.) Tried gbak with -g without success, sorry did not wrote down the
error, but could do it again if it's needed (I don't think so)
5.) gfix -validate
6.) gfix -mend
Results from firebird.log:
cpack-01-s (Server) Tue Jan 6 23:26:53 2009
Database: /home/bd/odin.fdb
internal gds software consistency check (wrong record length
(183), file: vio.cpp line: 1090)
cpack-01-s (Server) Tue Jan 6 23:35:31 2009
Database: /home/bd/odin.fdb
Index 3 is corrupt on page 61191 level 0. File:
../src/jrd/validation.cpp, line: 1549
in table NOTAFISCALENDERECO (184)
cpack-01-s (Server) Tue Jan 6 23:37:13 2009
Database: /home/bd/odin.fdb
Relation has 4 orphan backversions (0 in use) in table CLIENTE (139)
cpack-01-s (Server) Tue Jan 6 23:37:23 2009
Database: /home/bd/odin.fdb
Relation has 9 orphan backversions (0 in use) in table LOTE (175)
cpack-01-s (Server) Tue Jan 6 23:37:42 2009
Database: /home/bd/odin.fdb
Relation has 3 orphan backversions (0 in use) in table
NOTAFISCAL (183)
cpack-01-s (Server) Tue Jan 6 23:37:49 2009
Database: /home/bd/odin.fdb
Record 663384 is wrong length in table NOTAFISCALITEM (188)
cpack-01-s (Server) Tue Jan 6 23:38:11 2009
Database: /home/bd/odin.fdb
Relation has 322 orphan backversions (0 in use) in table
PEDIDOVENDA (201)
cpack-01-s (Server) Tue Jan 6 23:38:26 2009
Database: /home/bd/odin.fdb
Relation has 35 orphan backversions (0 in use) in table
PEDIDOVENDAITEM (203)
cpack-01-s (Server) Tue Jan 6 23:38:33 2009
Database: /home/bd/odin.fdb
Relation has 121 orphan backversions (0 in use) in table PRODUTO
(207)
cpack-01-s (Server) Tue Jan 6 23:39:01 2009
Database: /home/bd/odin.fdb
Record 344286 is wrong length in table SYNC_ACTIONDONE (273)
cpack-01-s (Server) Tue Jan 6 23:39:01 2009
Database: /home/bd/odin.fdb
Chain for record 344286 is broken in table SYNC_ACTIONDONE (273)
cpack-01-s (Server) Tue Jan 6 23:39:03 2009
Database: /home/bd/odin.fdb
Relation has -1 orphan backversions (1 in use) in table
SYNC_ACTIONDONE (273)
cpack-01-s (Server) Tue Jan 6 23:39:03 2009
Database: /home/bd/odin.fdb
Page 74700 is an orphan
7.) Did a back-up using gbak successfully
8.) Restored to another file, everything was going well, but on the
index creation phase I got error, looking at firebird.log I got this:
cpack-01-s (Server) Tue Jan 6 23:53:00 2009
Database: /home/bd/odinteste.fdb
internal gds software consistency check (wrong record length
(183), file: vio.cpp line: 1090)
I would expected errors on the restore process for FK violations, null
violations, etc. but not on the data itself, if I could read the
original database, and write on the back-up file I would expect that the
data is ok. I am not talking about referential integrity, or data
consistency to declared table rules, but to read the data only. Index
creation would read what it just write to the disk... Didn't expected a
problem here...
9.) Restored using gbak's -i option, so the indices are disabled,
successfully
10.) Droped 2 tables that had problems on the restore process that
currently does not holds useful information (Sync_Action and
Sync_ActionDone)
11.) Run gbak again on that database without indices, everything seems ok
12.) Restore that back-up to another file, successfully.
gbak: restoring privilege for user SYSDBA
gbak:creating indexes
gbak: committing metadata
gbak:finishing, closing, and going home
As you can see there is no index activation for the restored database,
So far so good...
At this point I think I have "all" the data in place, and just need to
re-activate the indices. In fact I will create a clean database and pump
over the data from this one. But, when I started to play with the
database to see if everything is ok I got some errors.
Database: /home/bd/odinteste_sem_indice.fdb, User: sysdba
SQL> select count(*) from cliente;
COUNT
============
Statement failed, SQLCODE = -902
internal gds software consistency check (decompression overran buffer
(179), file: sqz.cpp line: 223)
My Question is:
If I could read the data to create the back-up using gbak successfully
and restore the data successfully too, why can't I read it from this
freshly created database ?
Database: /home/bd/odinteste.fdb, User: sysdba
SQL> select count(*) from cliente;
COUNT
============
22021
again to make it clear:
odinteste.fdb was created using:
gbak backup_from_corrupted_database_after_gfix.fbk odinteste.fdb -c
-user sysdba -password masterkey -v -i
and I can read the data from it.
(without -i option I could not restore due to problems on a table that I
droped before the next step)
Then I created a back-up file from this file that I could do the select
gbak odinteste.fdb odinteste_sem_indice.fbk -user sysdba -password
masterkey -v
then I created another database to test the restore
gbak odinteste_sem_indice.fbk odinteste_sem_indice.fdb -c -user sysdba
-password masterkey -v
If I try to read from this new database I got problems as I showed above.
I think I had a hardware problem here (my costumer reported errors by
the OS that "fixed" after a reboot)
Now I am accessing it trough a ssh session, tomorrow I will be there and
will try to restore on my notebook to see if the problems disappears.
Any hints, comments, advices would be very welcome.
TIA
see you !
--
Alexandre Benson Smith
Development
THOR Software e Comercial Ltda
Santo Andre - Sao Paulo - Brazil
www.thorsoftware.com.br