firebird-architect - Re: [Firebird-Architect] RFC: Clustering

Subject	Re: [Firebird-Architect] RFC: Clustering
Author	= m. Th =
Post date	2006-11-19T17:52:40Z

Eduardo Jedliczka wrote:

> Again you ´re right, but if the cluster is Linux, Mac OS, or *BSD ?
>
>

No problem. Alex (Peshkov) runs remote shadows on NFS systems from quite
a while and he has a very positive experience with them, concluded from
the things that he told us.

> But some people need web replicating, or remote cluster (like oracle does).
> Here, Band Width is a greath problem.
>
>

I don't understand exactly what do you mean with 'web replicating' or
'remote cluster'. IF 'remote cluster' means that you have a workstation
located in Apucarana and you want to connect to a cluster which has ALL
the nodes located in my small office, using a slow web connection, then
read in my former document about 'cluster gateway'. Is a middle tier /
kind-a-proxy / router (name it as you wish) which is the only thing that
you 'see' from Brazil (and you think that is a server) so you will have
_one_ connection with me, so no speed loss over the slow part of the
connection. The 'gateway' will then send the commands to all the nodes
of the cluster ensuring that the 'replication' (in your terms) is made,
over a LAN connection which (as I stated) at 100Mb is very fast, at 1Gb
transparent. But this cluster gateway, because is a new point of failure
must be optional, ie. used only in the situations in which the developer
needs it (like this one).

But IF 'remote cluster' means 'remote cluster NODES' with multiple
access data paths this is just asking for trouble, IMHO. What I mean: If
my cluster is scattered having, let's say, one node on my office and
other node on Canada, is very easy for you which are located in Brazil
to connect at the node in Canada _only_ and Alex which is located in
Russia to connect at my node _only_ (which is located nearer). Then, in
deed, the replication/synchronizing engine will be *very* complex and I
doubt that it can handle at all all the situations (think about at
generators / autoincrement fields only...). But if you have only one
data path to your cluster and the downside segment is slow then we are
again in the case of our 'cluster gateway'.

> Think about: your cluster soluction is the "better replication soluction"
> always done for FireBird. When it´s implemented, certainly I use it to do
> remote replication.
>

Replication between servers should be avoided IMHO because it brings
some undesirable side effects:
1. concentrating network trafic on server NIC
2. Extensive use of server's CPU and/or memory for the replicating engine.
3. The delta time in which data is replicated. If the system fails in
this delta time we have a desync between nodes. And you don't easily
know where is the most recent server. (I had a similar problem with a
replication system. I speak from my small experience).
4. Complex logic and implementation imposed by the serverless model,
which must be adopted.
5. Other issues which I don't remember now :) (look at my first message,
at the beginning, I think)
6. Ask Jim. He'll tell you for sure some other.

>> So, turning a
>> node cluster off, even for a 'short time' render it unusable due to the
>> continuous flux of state changes (mainly contexts and TIDs) which must
>> be kept in sync between the nodes. In my small implementation, a node
>> can enter as active in cluster only when all the clients are out (see
>> above about 'Pending'). IMHO, I don't see (now) a solution for
>> replication live contexts over the servers.
>>
>
> Here we have a lot of needs, diferent problems to solve...
>
>

No, is solved. If we implement this by allowing the fbclient.dll to send
/simultaneously/ the commands to all nodes. A intresting thing which I
see now though is the time to switch to another node/connection thread
and continue fetching data from there when the fbclient kills the main
connection thread due to 'unable to connect to host' error. But I wait
the opinion from the developers.

> My needs is: only one database (in one server or a cluster) giving support
> to dozens or hundreds of stations. Of coarse we have some "report stations"
> (clients with large reports to print, and some times with massive SQL use).
> And unluckly FireBird don´t have a good structure to extract the power of
> new dual-core (and new quad-core) processors. If one station start a "big
> report" a lot of clients still waiting.
>

Usually the Firebird is very fast. Perhaps you can expose your problem
(with queries, configuration aso.) in firebird-support group. The main
goal of clustering in my implementation isn't to achieve a throughput
improvement.

> If we have a "true cluster" or "replicated database only-for-print" this
> need is solve or minimized.
>
> This is my POV: the sync can be made sending DDL/DML commands between
> cluster nodes, but again is need a sequencial-cache... (like redo-log in
> oracle)... Nodes in cluster may have different performance... some nodes
> maybe are more fast than others, and maybe have better network hardware (or
> less coalision problems)
>
>

Perhaps you can look at some replication engines already available on
Internet....

> My first idea to do this is "intercepting ALL communications between the
> true FireBird server and firebird clients" (like ibmonitor) and do the hard
> job (replication and routing connections to nodes) by my self. WHY ? easy: I
> can´t change the Firebird source code... I´m not a good C developper.
>
>

I doubt that this is reasonable to do (from outside). And how to reroute
the entire data to other nodes? Don't forget that you'll have to deal
not only with simple test commands like 'INSERT INTO T1...' but also
with parametrized inserts/queries which deals with binary data like
images, sound files aso. Also some things can be changed internally in
triggers and stored procedures, like internal inserts, deletes, updates
and generators for example... good luck anyway!

hth,

m. th.