Subject Re[4]: [Firebird-Architect] FW: Deadlocks on FB1.5
Author Nickolay Samofatov
Hello, Jim,

> It is not easy to design and implement careful write systems. To my knowledge,
> Interbase and its predecessor Rdb/ELN are the only two such system. The
> key to careful write systems is that the on-disk structure is controlled by
> page
> locks and cross page pointers are traversed with a formal handoff where the
> target page is locked before releasing the lock on the pointer. To make things
> deadlock free, the on-disk structures must be traversed in a consistent order,
> say index top to leaf then only to the right. Another rule is that a lock
> cannot
> be upgraded; if a logical upgrade is required, the data structures must be
> re-traversed from the top.

Thanks. This explains a lot to me.

> Looking at the code, it is apparent that somebody (post Borland acquisition)
> decided that deadlock free was too hard and handling deadlocks was a better
> idea. That person was wrong, but the damage has been done. The problem
> with systems that internally deadlock is that it is difficult if not
> impossible to
> prove them as careful write, or even demonstrate with any certainly that they
> world under load. Depending on deadlock detection for physical access path
> is a performance disaster for reasons you discovered. Deadlock detection is
> extensive; too expensive to do on every lock wait, particularly with the
> knowledge
> that most apparent deadlocks are self-clearing with blocking ASTs. For
> efficiency, then, lock managers are generally designed so deadlock scans are
> only performed after a relatively long timeout. But waiting for a long timeout
> is a disaster when traversing the ODS.

I understood what to do to make Firebird 1.5 work (more-or-less, at
least not worse than 1.0) as immediate DPM/VIO review is an overkill for
Firebird 1.5 release:
1. revert to older behaviour regarding deadlock handling for pages in
classic builds and relations in all builds (as they use asynchronous processing
of AST's via blocking flag and LCK_re_post)
2. Stress-test engine on WinXP in this configuration. If hard lockup problems
appear tweak them via manual process scheduling

> I guess that bottom line is that the original design concept is now irrelevant.

No. Your explanation helped alot. I'll schedule DPM/VIO review in Firebird 2
TODO.

> Good luck.

Nickolay Samofatov