Subject Re[3]: [Firebird-Architect] FW: Deadlocks on FB1.5
Author Jim Starkey
At 10:28 PM 5/15/03 +0400, Nickolay Samofatov wrote:

>I still do not have a perfect understanding of DPM operation, but what
>I see is:
>
>I. engine dies (or begins to crawl extremely slowly) under contention
>inside of the deadlock-detection loops. The reason of this behaviour
>is following logic (common in DPM code):
>
> 1. get lock 1
> 2. try to get lock 2 with timeout of 1 second
> 3. if timeout expired release lock 1 and go to step 1
>
>As you can understand under contention engine can do any useful work
>only in a short window between step 3 and step 1 when lock 1 is
>released. The worst case it that under Windows XP another process
>usually doesn't see a mutex flipping for a short period of time and
>engine halts completely.
>
>
>Could you explain original concept to me ? What is a secret weapon to
>fight such problems ?
>
>What I'm doing now is I'm trying to identify places with real
>deadlocks reported and fix them up via adding retry spin loops.
>But I don't think this was your original concept.


The original concept is that the database is "careful write", meaning that
physical
pages are written in an order such that the database on disk is always
valid and
consistent. For a system to be careful write, the access algorithms, even
under contention, must be deadlock free.

It is not easy to design and implement careful write systems. To my knowledge,
Interbase and its predecessor Rdb/ELN are the only two such system. The
key to careful write systems is that the on-disk structure is controlled by
page
locks and cross page pointers are traversed with a formal handoff where the
target page is locked before releasing the lock on the pointer. To make things
deadlock free, the on-disk structures must be traversed in a consistent order,
say index top to leaf then only to the right. Another rule is that a lock
cannot
be upgraded; if a logical upgrade is required, the data structures must be
re-traversed from the top.

Looking at the code, it is apparent that somebody (post Borland acquisition)
decided that deadlock free was too hard and handling deadlocks was a better
idea. That person was wrong, but the damage has been done. The problem
with systems that internally deadlock is that it is difficult if not
impossible to
prove them as careful write, or even demonstrate with any certainly that they
world under load. Depending on deadlock detection for physical access path
is a performance disaster for reasons you discovered. Deadlock detection is
extensive; too expensive to do on every lock wait, particularly with the
knowledge
that most apparent deadlocks are self-clearing with blocking ASTs. For
efficiency, then, lock managers are generally designed so deadlock scans are
only performed after a relatively long timeout. But waiting for a long timeout
is a disaster when traversing the ODS.

You've got a mess on your hands. As currently structured, the code has a
nasty tradeoff of the sort I went though a lot of effort to avoid.

I suspect there are two, maybe three, reasons for the current design (if you
call it a design): reverse index traversal, the need to support classic,
where physical deadlocks are self-clearing and superserver where they
can't occur, and possibly index bucket recombination (if done stupidly).
At the moment, you've got the worst of both worlds. But I can tell you with
absolute certainty that the current implementation is unnecessary -- neither
classic not superserver needs it. Maybe the combined code base does;
I hope not.

I guess that bottom line is that the original design concept is now irrelevant.

Good luck.

Jim Starkey