Subject | Re: Database corruption (new instance) |
---|---|
Author | Adam |
Post date | 2009-02-11T04:25:18Z |
Hello Vlad,
...
from the nbackup destination database, not the original if that makes
any difference.
I would love to provide a reproducible test case, but it doesn't seem
like the easiest thing to duplicate. I have only observed it this
once, but if there were particular pre-conditions I could look out
for, I might be able to create an example.
It occurs in the busiest table of the database, with probably 50 or so
connections simultaneously inserting or deleting (no updates) records.
The table holds the PK of records that require replication, so if any
connection changes something significant, triggers call a stored
procedure that inserts the record.
Basically, the stored procedure does the following:
---
BEGIN
DELETE
FROM SOMETABLEREPLICATION
WHERE KEY1 = :VALUE1
AND KEY2 = :VALUE2;
WHEN ANY DO
BEGIN
-- IGNORE
END
END
INSERT INTO SOMETABLEREPLICATION (ID, KEY1, KEY2) VALUES
(GEN_ID(GEN_SOMETABLEREPLICATIONID, 1), :VALUE1, :VALUE2);
---
It is possible (though in practice unlikely) for two simulataneous
connections to attempt to use the same Value1, Value2 combinations. If
this happens, one would get a lock conflict and you would end up with
two records (or more) duplicated records.
The replication logic itself simply joins from this table, using a
distinct clause to remove the duplicates. After it commits the changes
to the other database, it issues the following query.
DELETE FROM SOMETABLEREPLICATION
WHERE KEY1=:VALUE1
AND KEY2=:VALUE2
AND ID < :MAXID
(The max ID was the maximum visible ID for the given key combination
at the moment the replication started, thus being careful to not
remove from the replication table any changes made after the
replication process started).
...
released. I will certainly read the release notes of 2.1.2 when that
is released.
I noticed an issue in the 2.1 Bug Fixes pdf that seems related:
http://tracker.firebirdsql.org/browse/CORE-1139
It is flagged as fixed but describes exactly what I saw.
Adam
...
> > I needed to run gfix -validate, but being a live database I didn'tthan previous...
> > want to take everyone out unless absolutely necessary. I made a copy
> > of the database (nbackup -L, file copy, nbackup -N, nbackup -F on
> > copy). I then proceeded to validate the copy.
> >
> > As expected, there were a few (3 from memory) index page corruptions.
> >
> > From Firebird.log:
> > ---
> > DBSERVER Mon Feb 09 11:57:33 2009
> >
> > Database: E:\PRODUCT\ADAM\PRODUCTDB.FDB
> >
> > Index 1 is corrupt on page 295473 level 0. File:
> > ..\..\..\src\jrd\validation.cpp, line: 1537
> >
> > in table SOMETABLE (215)
>
> Never seen such error before. It means some index key is less
> very strangeAny ideas about what could cause such a problem? That piece of log was
from the nbackup destination database, not the original if that makes
any difference.
>case :(
>
> > DBSERVER Mon Feb 09 11:57:34 2009
> >
> > Database: E:\PRODUCT\ADAM\PRODUCTDB.FDB
> >
> > Page 295704 wrong type (expected 7 encountered 3)
>
> I would like to fix such errors but i have no reproducible test
I would love to provide a reproducible test case, but it doesn't seem
like the easiest thing to duplicate. I have only observed it this
once, but if there were particular pre-conditions I could look out
for, I might be able to create an example.
It occurs in the busiest table of the database, with probably 50 or so
connections simultaneously inserting or deleting (no updates) records.
The table holds the PK of records that require replication, so if any
connection changes something significant, triggers call a stored
procedure that inserts the record.
Basically, the stored procedure does the following:
---
BEGIN
DELETE
FROM SOMETABLEREPLICATION
WHERE KEY1 = :VALUE1
AND KEY2 = :VALUE2;
WHEN ANY DO
BEGIN
-- IGNORE
END
END
INSERT INTO SOMETABLEREPLICATION (ID, KEY1, KEY2) VALUES
(GEN_ID(GEN_SOMETABLEREPLICATIONID, 1), :VALUE1, :VALUE2);
---
It is possible (though in practice unlikely) for two simulataneous
connections to attempt to use the same Value1, Value2 combinations. If
this happens, one would get a lock conflict and you would end up with
two records (or more) duplicated records.
The replication logic itself simply joins from this table, using a
distinct clause to remove the duplicates. After it commits the changes
to the other database, it issues the following query.
DELETE FROM SOMETABLEREPLICATION
WHERE KEY1=:VALUE1
AND KEY2=:VALUE2
AND ID < :MAXID
(The max ID was the maximum visible ID for the given key combination
at the moment the replication started, thus being careful to not
remove from the replication table any changes made after the
replication process started).
...
> > * I observed something unusual with nbackup. After running -N, thenbackup.
> > delta file was left. Obviously it is left during the merge, but using
> > process explorer I could see that there was no fb_inet_server.exe
> > instances holding a handle. Any ideas?
>
> Upgrade to 2.1.2 as soon as it released. It have some fixes for
> I don't remember exactly, probably even 2.1.1 have nbackup patch.Release
> Notes definitely helps you.The version installed is 2.1.1.17910, which IIRC is the latest
released. I will certainly read the release notes of 2.1.2 when that
is released.
I noticed an issue in the 2.1 Bug Fixes pdf that seems related:
http://tracker.firebirdsql.org/browse/CORE-1139
It is flagged as fixed but describes exactly what I saw.
Adam