Subject Re: [Firebird-Architect] Special Relativity and the Problem of Data Scalability, Take 2
Author Jim Starkey
Paul Ruizendaal wrote:
> On Fri, 19 Mar 2010 15:45:41 -0400, Jim Starkey <jstarkey@...>
> wrote:
>> Jim Starkey wrote:
>>> Here are the slides from my IEEE/ACM talk last night. We've talked
>>> about much of this stuff before, but here it is in a single place.
> Thanks for providing the summary. I hope your audience was on board the
> clue train and that a lively discussion ensued.
> On page 21 you introduce a P2P fused cache (caching "atoms"). What is the
> role of the "atom chairman" and what happens if that goes down?
The chairman serializes does things like handing out ids and serializing
things that need to be serialized like updates to a particular record.

The line of succession is deterministic, so everyone knows where to go next.
> Pages 29 and 32 seem to suggest a variant of 2PC to commit across nodes,
> with the variation being that instead of 2-phase committing between nodes,
> you are 2-phase committing between redundant commit agents ("coteries" in
> your parlance). Is that what you meant?
There are three possible commit strategies:

1. A transactional nodes declares itself committed. Very efficient,
but there's a tiny possibility that if a node died immediately
after the commit message, a piece of data could be lost.
2. An archive node declares a transaction committed when it knows it
has all data from a transactional node. This works unless the
network is suddenly partitioned, in which case a client could see
a successful commit even though the commit didn't reach the
surviving partition and was rolled back.
3. A transaction isn't committed until a commit agent in every
coterie has seen the pre-commit. This sounds cumbersome, but
really isn't. It does, however, incur some latency.

We use #1 for debugging all the time, but probably won't be a user
option. We'll probably offer an option between #2 and #3. Choruses
contained within a single data center don't need it, but geographically
disperse choruses may (!)want it. The real reason for including it is
so that I don't have to argue about it anymore.
> On page 32 you have a line about network partitions: "the partition that
> contains a coterie survives". If you mean by that that only one partition
> can survive, such places restrictions on how "coteries" are formed. If
> multiple survivors are allowed, how do you propose to handle reconnection
> of two or more partitions? Note that on a heavily loaded system, messages
> may time out leading a system to think there are intermittent network
> partitions.
There are no disjoint coteries, and any coteries with common nodes are
by definition part of the same partition. This guarantees that only one
partition can contain a coterie.

There are lots of interesting coterie topologies. The obvious ones
involve a majority or half the machines. But is also possible to have a
topology the involves only three or five critical servers in the home
> By the way, wouldn't it be clearer if you named "coteries" something like
> "commit agent groups", or "commit agent clusters", or perhaps "commit
> groups" or "commit clusters" for short? Perhaps "objects" in place of
> "atoms"? Perhaps "active node set" or "active partition" instead of
> "chorus"?
I didn't pick the name. Coteries were invented and named by a very
bright fellow named Hector Garcia Molina, once chairman of the Stanford
Computer Science Department. He deserves full credit and I want to
honor his creation.

But thanks for the suggestion. But deepest honor an inventor gets is an
argument only on the name...

(Paul, are you aware that Thunderbird want to change your name to
"vandalized" or "scandalize"? What bad company it must think I keep.)

> Regards,
> Paul
> ------------------------------------
> Yahoo! Groups Links

Jim Starkey
NimbusDB, Inc.
978 526-1376

[Non-text portions of this message have been removed]