Subject Re: [firebird-support] Deadlock or Timeout ?
Author Helen Borrie
At 10:57 PM 3/01/2008, you wrote:
>We have a program that regularly generates an error when executing the
>update SQL below when running overnight and its the only active
>program/connection (Linux classic server LI-V2.0.1.12855 Firebird 2.0)
>but always works in the day when there are multiple users accessing
>the same data.
>
>The only obvious difference is that overnight it has to clear >6000
>records (out of 36000) whereas during the day the number is usually < 1000

Even 36000 is not a huge number.


>SQL = 'UPDATE PARTS SET INMRP = 0 where INMRP <> 0 '
>INMRP is defined as INTEGER DEFAULT 0 NOT NULL
>
>Error message = 'deadlock
>update conflicts with concurrent update'
>
>The program closes all its SQL's and transactions prior to executing
>this one. I trap the failure and then step through each records with
>INMRP<>0 and clear them individually - which always works.
>
>Question - is this actually a deadlock

No - most locking errors are reported at top level as "deadlock". The next message tells you that it is an update conflict - that is, a record can't be updated because another transaction already has a pending (i.e. uncommitted) update or delete or delete for it.

>, or simply the server timing out the SQL because of the number of records being updated ??

No. It's caused by a conflict with work from a different transaction.

The culprit could be a ghost left sitting in someone's chair: if you're running this update in snapshot isolation, then your transaction will experience this error if it was started before the retiring users' transactions finished committing, especially if some of them had WAIT transactions. It's kinda pointing that way, given that you *can* perform the same updates manually one by one. (Much slower reaching the problem record!)

Even without WAIT on those other transactions, it might be that User X's work set off a lot of garbage collection, that's slowing down the knock-off process. You might like to consider delaying the start of the overnight operation for a few minutes if it's a regular problem.

Also check any update triggers that might be bumping into uncommitted work on some other table, or transactions whose operations have trigger operations affecting the PARTS table.

If all else fails, and you consider it a problem, you'll need to sleuth for users who have left their application running without committing work.

./heLen