Subject Re: [Firebird-Architect] Re: Special Relativity and the Problem of Database Scalability
Author Jim Starkey
Paul Ruizendaal wrote:
>> Maybe you are suggesting a more radical approach ...
>>
>
> Yes, as I've mentioned before, I'm looking at something based on the Kemme
> research, Stonebreaker's "rewrite" observations and some other intriguing
> papers of the last half decade. The Kemme research summary is here:
> http://www.cs.mcgill.ca/~kemme/papers/icde05.pdf
> Please read past the Postgres details that she used in her proof of
> concept implementation -- it's irrelevant to her conceptual ideas.
>
>
>> ... where a transaction
>> executes completely locally recording all reads, probes (records that
>> would have been read if they existed), and writes, then at commit time
>> sends the log to a commit sequencer who maintains global state, and
>> either rejects the commit for broad casts the log to all nodes to apply
>> locally.
>>
>
> Yes that is pretty much the idea, although in the Kemme work the "commit
> sequencer" is distributed.
>
>
>> I suppose it could be made to work (the probes -- aka phantom
>> control -- are very, very tricky). However, the commit analysis must be
>> serialized. So either you're going to have to find a way to distribute
>> the function or face a central bottleneck that limits the performance.
>> But perhaps you can find a way around that problem.
>>
>
> As Roman pointed out there is the Paxos algorithm, two pages of pseudo
> code in a textbook and about 3,000 lines of C code in the Google
> implementation. The "spread" GCS library is quite established and only
> some 25,000 lines of C code. Yes, this stuff is very tricky to get right,
> but luckily it has been solved both theoretically and with proven open
> source code.
>
>
>> A transaction 1 on node A may not "see" all updates that occurred before
>> it started. A transaction 2 on node B might have started, updated the
>> record, and committed, but the reports of some or all of these might not
>> have reached node A before transaction 1 started. This is OK because
>> the resolution agent is going to tell transaction 1 that transaction 2
>> beat him to the record.
>>
>
> The role of your 'resolution agents' and my 'distributed commit sequencer'
> is pretty much the same. The question is which approach leads to the
> minimum communication and maximum parallelism. Kemme ends up with a total
> order on all write transactions and no order on read transactions. I think
> that the total order on all write transactions can be further relaxed,
> but I don't have that nailed down yet.
>
No, they aren't alike at all. A resolution agent resolves and preserves
the resolution of a potential update. The commit sequencer works after
the fact to see if transactions can be sequenced, not whether
constraints have been honored. The information required for the two
mechanisms is radically different. To resolve pending update, a
resolution agent needs only the table id, record id, and transaction
id. A commit sequences needs to know the full read and write sets, so
if a transaction counted a million records, the commit sequencer needs
to know not only which million records but which records weren't read
(e.g. hadn't arrived at that node yet). A resolution agent is
computationally trivial. A commit sequences requires a vast amount of
data and is incredibly complex.

The issue is to what end? The resolution agents preserve consistency.
A commit sequencer, while also preserving consistency, also has to do a
vast amount of work to preserve serializability, a completely
unnecessary characteristic.

The fundamental question is serializability a requirement for anything?
I argue that it is not. Serializability means not having to express and
enforce explicit database constraints. It is unnecessary and an
impediment to database scalabillity.

>
>> Yes, the resolution agents can be said to enforce serialization of
>> individual updates, but they don't force a serialization of
>> transactions.
>>
>
> We need to be precise here. We have already established that resolution
> agents enforce a consistent ordering on conflicting update transactions.
> Do
> you mean by the above that no ordering is imposed on non-conflicting
> updates, or that you disagree that conflicting updates get a consistent
> order?
>
> I think you need to be careful using the term 'serialization' as it
> suggests (at least to me) an identical sequence on all nodes. In my view
> each node can have its individual serialization and the whole can still be
> a consistent rdbms. The ordering terminology of GCS's allows us to be
> precise about the tolerable differences in serialization on each node.
>
> The question (again) is which approach leads to the minimum communication
> and maximum parallelism. In my view establishing the minimum required
> ordering is the first step in answering that question.
>
> It sure is a pity I can't attend your talk.
>
>


--
Jim Starkey
Founder, NimbusDB, Inc.
978 526-1376



[Non-text portions of this message have been removed]