Subject Re: [Firebird-Architect] Re: Special Relativity and the Problem of Database Scalability
Author Paul Ruizendaal
> Maybe you are suggesting a more radical approach ...

Yes, as I've mentioned before, I'm looking at something based on the Kemme
research, Stonebreaker's "rewrite" observations and some other intriguing
papers of the last half decade. The Kemme research summary is here:
Please read past the Postgres details that she used in her proof of
concept implementation -- it's irrelevant to her conceptual ideas.

> ... where a transaction
> executes completely locally recording all reads, probes (records that
> would have been read if they existed), and writes, then at commit time
> sends the log to a commit sequencer who maintains global state, and
> either rejects the commit for broad casts the log to all nodes to apply
> locally.

Yes that is pretty much the idea, although in the Kemme work the "commit
sequencer" is distributed.

> I suppose it could be made to work (the probes -- aka phantom
> control -- are very, very tricky). However, the commit analysis must be
> serialized. So either you're going to have to find a way to distribute
> the function or face a central bottleneck that limits the performance.
> But perhaps you can find a way around that problem.

As Roman pointed out there is the Paxos algorithm, two pages of pseudo
code in a textbook and about 3,000 lines of C code in the Google
implementation. The "spread" GCS library is quite established and only
some 25,000 lines of C code. Yes, this stuff is very tricky to get right,
but luckily it has been solved both theoretically and with proven open
source code.

> A transaction 1 on node A may not "see" all updates that occurred before
> it started. A transaction 2 on node B might have started, updated the
> record, and committed, but the reports of some or all of these might not
> have reached node A before transaction 1 started. This is OK because
> the resolution agent is going to tell transaction 1 that transaction 2
> beat him to the record.

The role of your 'resolution agents' and my 'distributed commit sequencer'
is pretty much the same. The question is which approach leads to the
minimum communication and maximum parallelism. Kemme ends up with a total
order on all write transactions and no order on read transactions. I think
that the total order on all write transactions can be further relaxed,
but I don't have that nailed down yet.

> Yes, the resolution agents can be said to enforce serialization of
> individual updates, but they don't force a serialization of
> transactions.

We need to be precise here. We have already established that resolution
agents enforce a consistent ordering on conflicting update transactions.
you mean by the above that no ordering is imposed on non-conflicting
updates, or that you disagree that conflicting updates get a consistent

I think you need to be careful using the term 'serialization' as it
suggests (at least to me) an identical sequence on all nodes. In my view
each node can have its individual serialization and the whole can still be
a consistent rdbms. The ordering terminology of GCS's allows us to be
precise about the tolerable differences in serialization on each node.

The question (again) is which approach leads to the minimum communication
and maximum parallelism. In my view establishing the minimum required
ordering is the first step in answering that question.

It sure is a pity I can't attend your talk.