Subject Re: [Firebird-Architect] Re: Special Relativity and the Problem of Database Scalability
Author Jim Starkey
Paul Ruizendaal wrote:
>> There can't be consistency without communication.
> True
>> The trick is to minimize the communication
> True
>> and, to the degree possible, allow it to operate asynchronously.
> True again.
>> A pending transaction can make a tentative
>> update and continue to execute pending a return message from the
>> resolution agent as long as it can rollback to the update statement and
>> return an error like a proper SQL database. This means, specifically,
>> that a mass update can have hundreds of pending resolution in flow, so
>> it needn't block until statement end and it has to return a proper
>> status code to the client.
> Why does this approach minimize communication? Is it not much more
> efficient to order writes and reject attempted commits whose write set
> overlaps with an earlier, concurrent commit?

Traditional record locking requires both read and write ordering, so
every record fetch requires synchronization with all nodes that have the
record present. This is a lot of messaging. And worse, the transaction
is blocked until the distributed lock manager grants the lock request.
That's pathetically slow unless a) there is very low latency
communication available or b) the distributed lock manager is designed
so that most locks can be resolved locally.

The original Interbase was designed and implemented to run on a cluster
with a distributed lock manager. If you replaced the Firebird lock
manager with a distributed lock manager, it would also work in a
cluster. Having done it, I know the performance characteristics. It
works well under high load with low contention, but gets pretty
miserable under high contention as nodes play ping pong with disk pages
using the disk as net. Oracle RAC had the same problems until they
required an ultra-low latency communication system for the lock manager,
an ultra-high bandwidth communication system for I/O, and caching disk
controllers. Still, cluster Interbase/Firebird and Oracle RAC can only
work in clusters with internal high bandwidth.

Serialization requires something that orders transactions, so
transaction ordering by locking requires per record synchronization for
every read and write (you can pick your granulatity, of course, so locks
can be on records, pages, tables, or databases, depending on concurrency

Maybe you are suggesting a more radical approach where a transaction
executes completely locally recording all reads, probes (records that
would have been read if they existed), and writes, then at commit time
sends the log to a commit sequencer who maintains global state, and
either rejects the commit for broad casts the log to all nodes to apply
locally. I suppose it could be made to work (the probes -- aka phantom
control -- are very, very tricky). However, the commit analysis must be
serialized. So either you're going to have to find a way to distribute
the function or face a central bottleneck that limits the performance.
But perhaps you can find a way around that problem.

Jim Starkey
NimbusDB, Inc.
978 526-1376

[Non-text portions of this message have been removed]