Subject URGENT: Corrupt indices
Author Michael Gast
Hi,

we have very strange errors in one of our customers production environment.

We are using Firebird CS 1.0.0 Build 796 on a cluster system with RedHat
Linux 7.2.

About 7 weeks weeks ago we started with a brand new implemented
production control system. After 5 weeks of work without any trouble, we
started to get problems with the system. We have got messages in our
application like
"database file appears corrupt () wrong page type
page XXX is of wrong type (expected 7, found 0)"
"database file appears corrupt () wrong page type
page YYY is of wrong type (expected 5, found 0)"
These errors are NOT written to the interbase.log file. They arise only
in our applications. If we start same the transaction a second time, the
error usually does not occur again, but sometimes it does. At this point
we started gfix -v -f to examine the database. It reported us a lot of
messages like "Index 3 is corrupt on page 299129 in table FSATZ (159)".

After running
gfix -m -f -i
gbak -b -g -t -v
gbak -c -r -p -v
gfix -v -f did not report any errors in our database.

But the same errors occurs after only a few days of work. The same cycle
of gfix -m -i, backup, restore, gfix -v -f repairs the database only for
a few days.

Some information to our system:
We have two databases: One consists of only one table with two rows, the
primary key and a blob field. It has about 1 millions of records and a
total size of about 9 GB spanned over 5 files (with a maximum of 30
files with each 250000 8 KB blocks). This database works fine.
The second database is much more complicated: The database size is about
1 GB. We have configured the system to use 3 files with 500000 4 KB
blocks to stay under the maximum file size of 2 GB.
1. The biggest table has about 2.8 millions of records.
2. The total amount of records over all tables is round about 6 millions
of records.
3. We have 80 tables with 131 foreign keys, 112 stored procedures and
171 triggers.
4. Disabling the sweep process did not change anything.
5. The index errors usually occurs in the most heavily frequented tables.
6. We have two different types of transactions:
- Import and export of productions data is done in transactions with up
to 20000 insert operations in the first database (with one table) and up
to 40000 insert or update operations in the second database. One of
these transactions is running up to 6 minutes.
- User transactions which usually does not change a lot of data.
We have no long running transactions.

Does anyone have an idea? Any hints?
Are there for example any circumstances to allow errors to be
transported over a cycle of gfix, backup, restore and gfix?

Thanks a lot in advance.

Micha