Subject Cloud database
Author paulruizendaal
Hi all,

Sometime ago Jim discussed cloud databases. I came across a paper
that folks interested in that topic might find a good read: "Don't be
lazy, be consistent":
http://www.cs.mcgill.ca/~kemme/papers/vldb00.pdf

The paper discusses how - using multicasting with total ordering - a
group of database servers can be kept synchronised. Using some
antique (networking) hardware by today's standards, they show how the
approach scales to 15 nodes or more. I think it might scale to 30..50
with current hardware.

The paper uses an old postgres release as its base. The project has
recently been revived (http://postgres-r.org/) and a source patch
against current postgres is about 400K, mostly new files. The group
comm's system is spread (http://www.spread.org/), which is a 200K
server binary (typically 1 instance per machine) and a 100K client
lib (used by each process). Not bad size for what it might deliver.

Looking past implementation and wording, I was surprised to see how
it has several elements in common with (what I understand of) Jim's
approach:
- a separation in SQL nodes and storage nodes (this is muddled in the
paper, but imho it is there)
- all storage nodes hold all data
- confirmed replication is commit (again the paper is confused on the
issue, but switching off fsyncing and using total ordering amounts to
the same)

The paper does not address data locality, but refers to future work
in this area. Note that Jim's approach of sending similar queries to
the same node(s) each time can be bolted on top of the approach of
the paper and will achieve the same improved cache hit rate.

As far as I am aware, none of that future work was actually done. The
most interesing missing piece imo is partioning the data so that not
all nodes hold all data, but each piece of data is held on a subset
of the cloud. This would avoid limited scalability because the
storage nodes are not keeping up with writing the changes to disk --
a problem that hobbled early releases of Falcon, if I remember well.

If anybody remembers seeing related research, I would certainly be
interested to read it.

Cheers,

Paul