firebird-support - Re: [firebird-support] Re: Database corruption

Subject	Re: [firebird-support] Re: Database corruption
Author	Helen Borrie
Post date	2008-10-30T00:23:43Z

>--- In firebird-support@yahoogroups.com, "Ann W. Harrison"
><aharrison@...> wrote:
>>
>> Adam,
>> > I had a report from a customer about a corrupt database, and I am
>> > looking for any hints to determine what has actually occurred and
>> > (hopefully) what may have caused it.
>> >
>> > Win 2003 Server, 1.5.5 Classic, from firebird.log:
>> >
>> > SERVER Tue Oct 28 14:30:04 2008
>> > Database: E:\DATA\DATA.FDB
>> > database file appears corrupt ()
>> > wrong page type
>> > page 195537 is of wrong type (expected 7, found 5)
>> > internal gds software consistency check (error during savepoint
>> > backout (290))
>> >
>> > This error occurs hundreds of times as different connections have hit
>> > the corrupt page and crashed.
>>
>
>Hello Ann,
>
>>
>> A backup/restore will correct the problem, but the solution(*) is to
>> move to a more recent version of Firebird.

At 09:11 AM 30/10/2008, Adam wrote:

>By "correct the problem", I suppose you mean "correct the problem
>until it happens again"?
>
>By "more recent", I presume that this is a known issue in 1.5.x that
>won't be fixed, and that 2.1 should (90%) be OK? We are still in
>testing with this version, but if that is the fix, then I will ensure
>that some focus is given to completing our testing of it.

A customer of ours distributes modeling software that rapidly deletes record sets in a "fake temporary table" situation. Records are keyed on a generated session ID. The application would delete a complete set of temporary records and immediately recreate a new set using the same key. During peak load Firebird 2.0.3 would segfault. Alex Peshkov traced the problem to the fact that indices are not updated in the same transaction as the one that deletes the nodes. Under normal load there's time for the index to catch up when the same node is "re-added". When throughput is heavy, the potential for an untrapped race condition to occur, causing a segfault. When segfaults happen, bets tend to be off with respect to the integrity of pages.

For 2.0.4 (AFAIR) he made it so that this race condition would cause an exception instead of simply proceeding and causing a segfault. I don't know whether this fix made it into v.1.5.5 but, if it didn't and this sounds similar to your application conditions, it could be worth testing v.2.0.4.

FWIW, the customer resolved the problem by generating *two* keys for each session and applying them alternately when regenerating the model sets.

./heLen