firebird-architect - Re: [Firebird-Architect] RFC: Clustering

Subject	Re: [Firebird-Architect] RFC: Clustering
Author	= m. Th =
Post date	2006-11-18T08:43:10Z

= m. Th = wrote:

>
> 2. How the cluster is created a. The goal
>
> The goal is to create identical copies of the database on each
> cluster node, having inside the list with the database files from the
> other nodes, stored in RDB$Files with the appropriate flags.
>
> b. A proposed way (using SQL)
>
> <snip>

Perhaps, I found an improvement to this... read bellow.

> On all the servers we’ll have in the RDB$Files (at least) the
> following three records: Server-A:C:\MyDB\DB01.FDB
> Server-B:C:\MyDB\DB01.FDB Server-C:C:\MyDB\DB01.FDB IMHO, in the
> flags field we can put the node status: Data server, on-line,
> offline/outdated.
>
>

We'll introduce a new status: 'pending'. In order to avoid the following
situation (which I explained in my previous message):

> c. Things to consider i. All the clients must be out (perhaps is
> enough ‘deny new transactions’ shutdown mode)
>
> In other words ‘Secondary attachments cannot create clusters’ – else
> the already connected users to the database which is currently
> promoted to a cluster will remain attached to ‘Server-A:C:\MyDb…’ and
> the new ones will connect to all ones, thus generating a data
> de-synchronization between the nodes.
>

How 'pending' works?

Let's take an example:
We have some clients connected on Server-A. We issue: CREATE CLUSTER
NODE 'SERVER-D:C:\DB01.FDB' Then the engine will create (immediately)
the file on Server-D and will enter in the RDB$Files of the cluster
nodes the corresponding entry with the file name and with the status:
'Pending'. In this status the file will work as a shadow attached on
Server-A. Of course you need to implement something like ACTIVATE
SHADOW, so the Server-D can open it at any time (IMHO, this is a must,
not only for this matter). Turning back to our story, the file on
Server-D will be updated from the engine from the Server-A until the
number of clients from Server-A reaches to zero (which normally means
that no users use the clustered database). Then the engine will update
the file status (making it on-line) on each node, Server-A detach the
shadow, Server-D activates it and the node from Server-D is on-line.

-------------
Responding to Eduardo:

> First off all, I think CLUSTERS NODE fabulous, and I like a lot of yours
> commentaries, you was used a lot of time thinking on it. But I beliave
> shadows is not the better way. A Internal soluction like a flag used in
> NBackup (with a PageVersion from any cluster node) much better...
>
> Of coarse we need the primary database server. But if we turn off a
> secondary cluster node for a short time. I think the recovery time is very
> long with shadows (again I talk... it´s only my POV), and copying only
> changed pages is a fast way to do the same thing.

First of all, thank you very much for your input. In deed, is possible,
in the situation in which you exposed above, that copying of the changed
pages only to be faster. But we have some things to observe here:

1. The possibility that '*we* turn off a secondary cluster node' is,
IMHO, let's say, very rare. AFAIK, "very few" administrators turns off
their servers. The most probable case, and for these cases the clusters
are designed, is when the servers crashes due to an let's say
'unplanned' event (hardware failure, power down, network cable out,
viruses etc.). In this case, is probable that the affected database file
to be corrupted. This corruption can be seen immediately by the engine
or can be not. Anyway, IMHO, the database file isn't reliable anymore
and a page header scan won't tell us which are the 'ill' pages (it will
tell us only the 'changed' ones), and this is in the 'happy' case in
which the db header structure remains intact. So, in my very humble
opinion, doing a page synchronization (I think that we can call in this
way the thing that you want to achieve) can be a very fast thing but a
very fragile one. Also, the delta file which is created during the page
sync process must be applied on both files, thing which requires a
separate engine.

2. The remote shadow engine is ultra-fast, at least on Windows where I
have experience. At 1Gb the network is transparent, at 100Mb it has a
very small lag (ie. in our tests 1% which means that at a test which
keeps 100 secs without shadow, with shadow we have 101). This is
achieved by writing synchronously on both files (main and shadow) only
the changed pages. The node *creation* is done asynchronous (when we
want - see the 'Pending' point above), so the creation time doesn't
matter, only the node *activation* is required to be done with no one
attached to the clustered db but the activation is blazingly fast: some
bytes changed in the db header and a field from a record in RDB$Files.

3. Why do you want to 'turn off the secondary cluster node for a short
time'? There are much more things to 'replicate' between nodes, not only
the database file. IMHO, the file is the simplest. There are open
contexts, TIDs, generators, metadata changes etc. That's why I choose to
do the 'replication' directly from the client. IOW, the 'replication
engine' is FbClient.dll which sends the commands to all nodes so we
don't mix with DLMs, multi-host syncs, comm layers aso. So, turning a
node cluster off, even for a 'short time' render it unusable due to the
continuous flux of state changes (mainly contexts and TIDs) which must
be kept in sync between the nodes. In my small implementation, a node
can enter as active in cluster only when all the clients are out (see
above about 'Pending'). IMHO, I don't see (now) a solution for
replication live contexts over the servers.

> My first problem was UDFs (with access external files, drivers O.S.
> resources or web access), and some "update Stored Procedures"... but IF
> cluster support is made inside the DataBase... this operations can be
> "replicated" correctly in cluster nodes.

Anything which is 'inside' of database gets 'mirrored' on the nodes. UDF
DECLARE statements are, the .DLL/.SO aren't. So, you must 'replicate'
them 'by hand' ;)

> The other problem is USER control (grants, etc) ... but it´s other story...

This is solved (at least for Windows). The user which is used to start
the process must have read-write rights on the share(s) on the other nodes.

(A little helping hint - with love: Eduardo, perhaps will help that at
the end to your messages to press F7 (I saw that you use Outlook 6) or,
if you want change your eMail client to Thunderbird. It has
spell-checking as you type. It is one of my ways to learn English. I'm
also from a Latin country like you are)

hth,

m. th.

------------
A 'small' typo in my previous message. (It seems that my eMail client
or, rather, the Yahoo! service cut out the spaces from beginning). The
following 'diagram'...

>
>
> CL1 Server-A | | CL2 - Switch 1 ----- Switch 2 – Server B | |
> Server-C CL3
>