firebird-support - Re: [firebird-support] Validation problem

Subject	Re: [firebird-support] Validation problem
Author	Ann Harrison
Post date	2012-01-07T22:19:35Z

On Sat, Jan 7, 2012 at 9:19 AM, Aldo Caruso <aldo.caruso@...> wrote:

> I didn't lose information, so I assume it was an orphaned back version.
> Nevertheless it is not clear what produced this pattern:
>
> - The client (flamerobin) tried to apply an update on a table.
> - The update was rejected because of a lock conflict (normal behavior
> when other clients are also applying updates).

OK.

> - The client did a transaction rollback.

OK.

> - Any subsequent attempt to apply the same update hanged the client for
> ever.

That's truly strange. I don't suppose you saved the database with the
stuck record? Here's what's supposed to happen. When a transaction
attempts to update a record that was updated by a concurrent
transaction, it asks the lock manager for a shared lock on the
conflicting transaction's transaction number. Each transaction keeps
an exclusive lock on its transaction id until it completes. When the
conflicting transaction completes, the waiting transaction is
notified, checks the state of the transaction it was waiting for, and
if that transaction committed, it gets an update conflict error.
That's the most common way for conflicts to be resolved. The
transaction that got the error rolls back, restarts, finds that the
previously conflicting change is no longer a conflict because the
running transaction started after the new version was committed and
all is well... more or less. As long as transactions correctly change
their state from active to committed or rolled back when they end,
conflicts go away.

The second type of conflict is an actual deadlock, detected by the
lock manager. The lock manager is an in-memory subsystem that manages
a table of locks. It does a periodic walk of the wait-graph, looking
for cycles. So if transaction 123 is waiting for transaction 100 and
transaction 100 is waiting for 123 (or any sequence that leads to a
cycle, could be ten or twenty transactions long). When the lock
manager finds a cycle, it picks a victim and Firebird sends that
transaction a message telling it to give up.

It's vaguely possible that the lock manager code might incorrectly
fail to release a lock, but since that's entirely in memory,
restarting the server would resolve the problem.

> - Only solution: backup and restore.
>

I'm having a very hard time imagining a situation that would require a
backup/restore to resolve a record update conflict. A partial
two-phase commit is the one situation that might seem intractable, but
it should cause the backup to fail, unless it was run with the "ignore
limbo" switch. And when a transaction tries to update a record that
was updated by an early transaction that failed in the middle of a
two-phase commit, it gets an immediate error, which should suggest
that gfix be used to resolve the state of that transaction.

Transactions wait only for transactions that are active. When the
server restarts, it looks at the state of transactions with numbers
lower than the Next Transaction value on the header page and sets them
to "rolled back". I don't suppose you saved that database...

Good luck,

Ann