Subject Re: [Firebird-devel] V4_THREADING
Author Jim Starkey
First, unless someone has a better idea, I'm going to move the threading
discussion to the architecture list.

Nickolay and I have been having a good discussion of the issues around
dragging Firebird into the fine granuality threading world. We have
agreed that the primary synchronization mechanism should be read/write
(I prefer shared/exclusive terminology) locks, not mutexes. We haven't
reached agreement on how those functions should be implemented or
exactly what their semantics should be.

Let's take the semantics question first. The only real question is what
happens when a thread holding an exclusive lock on an objects requests a
subsequent exclusive lock on the same object. Classical lock semantics
hold that either this is an error or a single-thread deadlock. Monitor
(as in Java) semantics treat this case as legal operation, bumps a use
count (decremented on release), and releases the lock when the use count
goes to zero.

The monitor behavior is very convenient in any recursive activity,
obviating any need for additional housekeeping to avoid an error or
deadlock. I can't say for certain it is necessarily whether it is
required for fine granularity threading in Firebird, but is used in
Netfrastructure for Java class loading, a highly recursive activity.

A second question is performance tradeoffs. The easy what to make a
rwlock class is to use a mutex to control access to the synchronization
object, making an uncontested lock more expensive than a mutex. The Sun
rwlock mechanism has this performance characteristic, though I can't say
this is their implementation. The alternative is a homebrew using
whatever interlock ingredients available on the platform (more on that
later). An rwlock mechanism optimized for database trades off speed of
uncontested locks for a higher overhead for contested locks. I know of
no generally available rwlock mechanism with both monitor semantics and
fast lock performance, though there may be some. Anyone? Anyone?

In the absense of a portable solution, rolling your own is alternative.
The plus side is that you get exactly the semantics you need. The
downside is a lack of portability. Synchronization classes I have built
relies on interlocked (i.e. atomic) increment/decrement instructions.
Microsoft exports the interlocked instructions as a compiler intrinsic.
On Linux, the instructions can be invoked, with some pain, with inline
assembler. Most (modern) non-Intel architectures implement a
compare-and-swap instruction, from which an atomic increment can be
trivially implemented. PowerPC implements a two instruction variation
on the compare-and-swap in which the swap fails is a bus lock is lost.
Earlier processor architectures, the 32 bit sparc, for example, support
only a test-and-set instructions, useful for a mutex but lousy for
anything more useful.

I've built Netfrastructure around a fast-lock monitor class implemented
with platform specific user mode interlocked increment/decrement
instructions. I only support Intel architecture, but it works like a
charm under very high local on fast dual-processor system, but extension
to sparc V9, power pc, and Itanic would be "straightforward". (Note:
folks buy the machine to run Netfrastructure, so I haven't the slightest
interest in a port to anything but the the AMD 64).