Subject Re: [Firebird-Architect] Re: Cloud databases
Author Roman Rokytskyy
> Perhaps, perhaps not.
>
> - How will you partition the data? Is this a user/dba decision or is it
> something the software does by itself?

I would go for the auto-configuring software, though it is not trivial.
I would replicate data sets on at least three nodes, so that when one
node goes down, two others can still serve the request.

> - How does the SQL compiler/relational engine know where to find a
> particular piece of data? Do you partition the queries/updates as well?

That is easy - every node replicates the partition information in the
group. You can do this dynamically.

> - what happens if a few new machines enter the cloud? How will data get
> repartitioned?

I did not think a lot about this, but I would:

a) select the candidates for splitting.
b) copy the splitted data to a new node(s)
c) propagate the updates that happened during copying the data to new nodes
d) drop the no longer needed data sets.

An optimization should be possible, if new record versions would be
stored on the new nodes only.

> what happens if a few crash?

Select nodes that can handle new copies and propagate them there. If no
node can handle load for a complete replica, act as if we were splitting
the existing partition.

> I think that using a stable hash as partioning algorithm will go a long
> way, but a clear & simple algorithm/architecture hasn't gelled in my
> mind. The brute force approach of storing everything everywhere just
> seems so much simpler.

True, but in this case you do not need to implement alot, just take
Sequoia and run your cluster. You have dynamic addition of nodes there
too - a database adapter is required to implement backup/restore service.

Roman