firebird-architect - Firebird and cluster

Subject	Firebird and cluster
Author	Evgeny Putilin
Post date	2007-07-28T15:54:27Z

Hi All.
Some thoughts concerning Firebird and cluster.

Preface:
Why we wish to have cluster? For the solution of two problems
- load balancing
- failover
It is ideal to have the solution of these problems together, but also the solution of each problem separately too is comprehensible.
We already have failover cluster for more detail see http://www.ibase.ru/ibinstall/ib7cluster.htm.
I know two way for create load balancing:
- based on shared storage (Oracle RAC as example)
- based on replication (MySQL cluster as example)
By the current moment there is an open question about cluster with support load balancing and in an ideal with support of two possibilities at once.
At conversation on creation cluster for Firebird all thoughts have been turned towards possibility classic engine with distributed lock manager. This variant allows cluster with support of two features. But it has the lacks:
- There is not very good possibility to solve split-brain syndrome.
- It is necessary to bring many changes in a engine code
--- Adaptation FB lock manager to OpenDLM
--- Search of the solution for a parcel event
--- Problems with cache pages that the written down page has appeared on all sites cluster

Main idea:
I suggest considering a variant of creation Firebird cluster based on replication. By the current moment already there are many solutions for replication Firebird DB, but they are external for an engine. My offer it to make replication in which the engine will participate. And changes in an engine should be minimum and all functionality to take out in separate entity: replication agent. It can be as the engine part, the loaded module (.so or.dll), or external process. The best variant is the loaded module, it will allow to separate functionality development replication agent from changes of a code of the engine. Also we can have 3-d party solution for replication agent.
List of new functionality in engine core:
- create special flag for attachment, if the flag is set a replication and triggers for this attachment is disabled.
- create event manager notifying external listener of each operation from list
-----insert/update/delete record
-----start and rollback savepoint
-----commit and rollback transaction
- create special timestamp field contains last timestamp of last modification
Other functionality should be in replication agent.

Terminology:
master server: sender of replication messages
subscriber server: receiver of replication messages

Any server can be master or receiver. If we have two server first master and second subscriber, we tall this use case is one-way replication. If we have two servers and each server master and subscriber we tall this use case is bi-directional replication.
Functionality which can be realized in replication agent:
1. Synchronous and asynchronous replication. We can make synchronization on wait of finish I/U/D on subscriber or on the wait commit. For synchronous replication master can wait for the termination of operation I/U/D or to the termination commit on subscriber. In case of asynchronous replication master continues work without waiting any operation worked on subscriber.
2. In case of network break, or split-brain syndrome synchronous replication on timeout can turn into the asynchronous. In case of long-term break of communication, asynchronous replicationя can store redo log on HDD, and in case of possibility of overflow of a disk replication can be stopped with the subsequent manual restoration of synchronization.
3. Asynchronous replication can lead simultaneous updating of data on several hosts. We should provide the mechanism with automatic or manual synchronization of data. The mechanism of manual synchronization of data should allow synchronizing if synchronization has been stopped for various reasons.
4. Since there is a variant of data which the account condition, quantity of the items in storage, for example. That in replication message should contain delta of changes. For other types of fields it is necessary to send absolute value.
5. For possibility of the solution of conflicts simultaneously changes of data it is necessary to enter special field type containing last change of this record (changes an engine).
Cluster based on replication solve all problems:
-- fail over: if one server is died, second is alive.
-- load balancing: all servers working in isolation mode.
We have open question: how are client connect to different server? We can use hardware solution, or software. For soft-ware we must change client. After connect client to one server from cluster, client retrieve address for all available server, and store it. If server is died, client connects to other lived server. Load balancing is more complex task. One way if solution is client can divide connection from pool between servers, or use other acceptable algorithm.
For DDL (change metadata) we must work only in synchronous replication mode. If synchronous replication mode is unavailable, the cluster turns into manual restoration of synchronization.
Generators is not replicated.

WBR Evgeny Putilin