Subject | Re: [firebird-support] Validation problem |
---|---|
Author | Aldo Caruso |
Post date | 2012-01-08T18:40:16Z |
Ann,
Thanks for your accurate answer.
I did saved the database file with the problematic record (having
done first a shutdown and before any attempt to restore or mend the
database).
So, I connected to it today and executed the same query that led to
the situation I described. It run with no problem and gave no error
regarding locking or concurrence (mainly because no other processes are
running today, Sunday).
I must conclude that, when I did that a few days ago, some
processes were still connected (or reconnected as soon as I put the
database on line) and caused a locking conflict that hanged my client
process.
Perhaps doing a backup and restore was innecessary, and may be the
problem could have been solved doing a shut down, taking exclusive
access to the database, running the query logged as sysdba, and then
putting the database on line again.
Nevertheless, the database still has the same record level error as
it had days ago.
Running either "gfix -validate -full" or "gfix -mend -full -ignore"
still throws the same error message:
"Summary of validation errors
Number of record level errors : 1"
I believe it may be a harmless orphan record and so running a backup
and restore corrects it.
Thanks again for your support,
Aldo Caruso
El 07/01/12 19:19, Ann Harrison escribió:
Thanks for your accurate answer.
I did saved the database file with the problematic record (having
done first a shutdown and before any attempt to restore or mend the
database).
So, I connected to it today and executed the same query that led to
the situation I described. It run with no problem and gave no error
regarding locking or concurrence (mainly because no other processes are
running today, Sunday).
I must conclude that, when I did that a few days ago, some
processes were still connected (or reconnected as soon as I put the
database on line) and caused a locking conflict that hanged my client
process.
Perhaps doing a backup and restore was innecessary, and may be the
problem could have been solved doing a shut down, taking exclusive
access to the database, running the query logged as sysdba, and then
putting the database on line again.
Nevertheless, the database still has the same record level error as
it had days ago.
Running either "gfix -validate -full" or "gfix -mend -full -ignore"
still throws the same error message:
"Summary of validation errors
Number of record level errors : 1"
I believe it may be a harmless orphan record and so running a backup
and restore corrects it.
Thanks again for your support,
Aldo Caruso
El 07/01/12 19:19, Ann Harrison escribió:
>[Non-text portions of this message have been removed]
> On Sat, Jan 7, 2012 at 9:19 AM, Aldo Caruso
> <aldo.caruso@... <mailto:aldo.caruso%40argencasas.com>> wrote:
>
> > I didn't lose information, so I assume it was an orphaned back
> version.
> > Nevertheless it is not clear what produced this pattern:
> >
> > - The client (flamerobin) tried to apply an update on a table.
> > - The update was rejected because of a lock conflict (normal behavior
> > when other clients are also applying updates).
>
> OK.
>
> > - The client did a transaction rollback.
>
> OK.
>
> > - Any subsequent attempt to apply the same update hanged the client for
> > ever.
>
> That's truly strange. I don't suppose you saved the database with the
> stuck record? Here's what's supposed to happen. When a transaction
> attempts to update a record that was updated by a concurrent
> transaction, it asks the lock manager for a shared lock on the
> conflicting transaction's transaction number. Each transaction keeps
> an exclusive lock on its transaction id until it completes. When the
> conflicting transaction completes, the waiting transaction is
> notified, checks the state of the transaction it was waiting for, and
> if that transaction committed, it gets an update conflict error.
> That's the most common way for conflicts to be resolved. The
> transaction that got the error rolls back, restarts, finds that the
> previously conflicting change is no longer a conflict because the
> running transaction started after the new version was committed and
> all is well... more or less. As long as transactions correctly change
> their state from active to committed or rolled back when they end,
> conflicts go away.
>
> The second type of conflict is an actual deadlock, detected by the
> lock manager. The lock manager is an in-memory subsystem that manages
> a table of locks. It does a periodic walk of the wait-graph, looking
> for cycles. So if transaction 123 is waiting for transaction 100 and
> transaction 100 is waiting for 123 (or any sequence that leads to a
> cycle, could be ten or twenty transactions long). When the lock
> manager finds a cycle, it picks a victim and Firebird sends that
> transaction a message telling it to give up.
>
> It's vaguely possible that the lock manager code might incorrectly
> fail to release a lock, but since that's entirely in memory,
> restarting the server would resolve the problem.
>
> > - Only solution: backup and restore.
> >
>
> I'm having a very hard time imagining a situation that would require a
> backup/restore to resolve a record update conflict. A partial
> two-phase commit is the one situation that might seem intractable, but
> it should cause the backup to fail, unless it was run with the "ignore
> limbo" switch. And when a transaction tries to update a record that
> was updated by an early transaction that failed in the middle of a
> two-phase commit, it gets an immediate error, which should suggest
> that gfix be used to resolve the state of that transaction.
>
> Transactions wait only for transactions that are active. When the
> server restarts, it looks at the state of transactions with numbers
> lower than the Next Transaction value on the header page and sets them
> to "rolled back". I don't suppose you saved that database...
>
> Good luck,
>
> Ann
>
>