Subject Database corruptions with 2.0.3 embedded server
Author jaratulpas
We have a heavily multithreaded application written on top of the
Firebird 2.0.3 Embedded Server for Windows. Our problem is that about
20% of our users have experienced database corruptions.

Almost all corruptions are of the same kind - wrong page type. While
traversing a B-tree index, the engine suddenly jumps into a non-index,
data page:

*** IBPP::SQLException ***
Context: Statement::Execute( ... )
Message: isc_dsql_execute2 failed
SQL Message : -902
Unsuccessful execution caused by a system error that precludes
successful execution of subsequent statements
Engine Code : 335544335
Engine Message :
database file appears corrupt ()
wrong page type
page 266120 is of wrong type (expected 7, found 5)

The other 2% of the corruptions are detected as page checksum errors:

*** IBPP::SQLException ***
Context: Statement::Execute( ... )
Message: isc_dsql_execute2 failed
SQL Message : -902
Unsuccessful execution caused by a system error that precludes
successful execution of subsequent statements
Engine Code : 335544335
Engine Message :
database file appears corrupt ()
bad checksum
checksum error on database page 189721

We suspect this to be a thread-related problem, since all corruptions
occurred on hyper-threaded processors.

The application is written in C++, compiled with VS2005, and the
Firebird engine is accessed only through the IBPP library. We do have
however some extensions: database connections are serially reused, and
prepared statements are cached.

When the application is first started, it fills the database with
about 3 GB data from a server which the user then accesses. Most
database corruptions appear during this initial bulk load.

We tried to reproduce the database corruption by performing stress
tests, but we were unsuccessful so far. We received some strange
database exceptions (e.g. an impossible "violation of PRIMARY key"
reported as a deadlock), but no database corruptions.

The most relevant error in our stress tests is this bugcheck:

*** IBPP::SQLException ***
Context: Statement::Execute( ... )
Message: isc_dsql_execute2 failed
SQL Message : -902
Unsuccessful execution caused by a system error that precludes
successful execution of subsequent statements
Engine Code : 335544333
Engine Message :
internal gds software consistency check (CCH_precedence: block marked
(212), file: cch.cpp line: 3653)

It happens after the stress-test creates its database and spawns
several threads that insert rows into a table. It appears only on
hyper-threaded CPUs, setting CPU affinity to a single core causes the
error to disappear.

Since it happens in BTR_insert, could it lead to database corruption?
(The discussion in CORE-1199 does not cover it.)

Could someone please point out some possible causes, or what else to
look at?

Or if someone could fix it in Firebird... :-)

Thanks for any help.

Jara