Subject | Re: RFC: Clustering |
---|---|
Author | Roman Rokytskyy |
Post date | 2006-11-20T20:44:42Z |
> This architecture is fraught with many problems, each of which must:) Academics call it "state machine replication". Interestingly
> be solved.
enough, it is widely used in distributed systems, but I have never
heard about it being used in databases. If I remember correctly, one
of the issues was the performance.
> The hardest is how to make sure the servers execute all inThis is solvable, at least in case of LAN. One can use reliable
> the same precise order. Without this, the scheme can't work.
> I don't have any idea of how you might do this, but perhaps
> you have a solution.
multicasting with total ordering protocol stack, which would guarantee
that the messages are delivered to all nodes in the same order. WANs
have bigger latencies, so the reliable multicast is not that reallistic.
> Second, there is a problem of non-deterministic behaviors such as aIF such code is used, the state machine replication cannot be used -
> random number generator or translation of the manifest constant
> "now". Each will yield different results on different system and the
> servers will diverge.
the execution becomes nondeterministic. So far (I'd say, years
2002-2003, when I finished researching that area) there were no
solution and only one workaround - "do update on one site and
replicate the modified records/pages to other nodes".
> Finally, I don't see any way for a node to rejoin theWell, in theory, alive nodes can detect the crash (typical property of
> cluster without taking down the database. This rather defeats the
> scheme, doesn't it?
the reliable multicast protocol, take it as granted). At that point
they can start writing the difference file like nbackup does. When the
node rejoins the group, only difference must be replicated. So the
only issue is to rejoin the group soon, otherwise difference might get
bigger than the database size.
Roman