Subject FBReplicator...?
Author Jonathan Neve
Hi all!

I wrote a while ago to say that I was writing a replicator for IB/FB,
that I would be glad to contribute to the FireBird community. Wouldn't
it be nice for FireBird to be bundled with a good replicator, like
Interbase? I know there are several commercial replicators available,
other than IBReplicator, as well as one freeware replicator. However,
that replicator is only one-way. This could be useful, but, in many
circustances, it wouldn't be good enough. So that's why I thought I
could perhaps give you the replicator I wrote, because I would have been
quite glad if I had been able to find such a replicator a few months ago
before I started!

My replicator includes quite a few features that aren't present in
IBReplicator at all. This is because I had a created an database
application, and a database, without replication in mind at all. With a
bit of hindsight, I think I have put too much intelligence in the
database (mostly triggers), which often causes difficulties, and
certainly made replicating much harder, as triggers have a nasty
tendancy of firing at you when you're not expecting it! :-)

Too get around the many problems I came across, I had to implement some
rather advanced features that I've never seen in any other replicator
(perhaps they do exist somewhere, I've just never seen it).

So here is a brief description of how it works:

The situation this replicator was designed for was that I had one main
database, which all the clients were connected to through internet. To
make things faster, I have made so that now each client is to have their
own copy of the database, and all changes made to one get replicated to
the other, and vice-versa. I'm explaining this, just to say that as
things are at present, there has to be a direct connection to the main
database server in order to be able to synchronize changes. You can't
simply drop off the changes on some server, for the other clients to
then come and pick them up. I don't think it would be very difficult to
add this functionnality to the existing replicator, but as I have no
need of it, I didn't do it.

This is tied in with another aspect of the replicator, which is that it
requires no software on the server side. This was important for me, as
my database server runs under Linux.

So on each client machine there is a replicator, and on the server,
there is nothing special. Periodically, the replicator kicks in and
replicates all changes, first from the client to the server, then from
the server to the client. Both server and client can be used as local
database servers (such is my case).

So in each database, there are three triggers per replicated table (you
can configure which ones you want to replicate). These triggers insert
into a log table one line per event and per user, the users being listed
in another table. When the replicator then connects, it takes only such
records as concern the client machine it's running on, and then deletes
them.

So far, I think this is basicly how all replicators work. Here now are a
few specific features of mine.

One is replicating procedures. This sort of thing shouldn't usually be
necessary, but I implemented it in order to get out of a difficult
situation, and it might be useful also for someone else. The problem was
that I have a table in my database called STOCK. This table contains one
line per product and per sales-point, and a quantity field indicating
the number of these products in stock at the present time. To avoid
problems, I do all updates to this table through a dedicated procedure
called STOCK_MVT, to which I pass as a parameter the number of products
to add to/subtract from the stock. If I simply replicate this table,
what will happen? Suppose in databases A and B, a certain product had
value 45. Suppose 4 of this product are then added to the stock in
database A, and 3 are subtracted in database B. DB A will end up with
quantity 49 and DB B will end up with 42. Suppose DB A replicates at
this point. The value from DB A will get sent to the server, so the
server will have a value of 49. Then DB B replicates. So either the
server gets the value from DB B (42), or DB B gets the server's value
(49), but either way, both are wrong. The correct quantity is obviously
46. Yet none of the three databases would have this value!

So this is unacceptable. Instead, I now replicate my stored procedure
call. In other words, I created another procedure that calls my stored
procedure, and also logs the call in the replication log, with the exact
SQL statement that was executed. That way, the replicator simple reads
the "EXECUTE PROCEDURE" statement, and executes it. This solved my
problem, because now instead of sending the value to the server, each
database merely sends its change (+4 or -3). So now DB A would send +4
to the server, resulting in value 49. DB B would then send -3 to the
server, resulting in 46 on the server, and it would also get +4 from the
server, so that from 42 it would also move up to 46. Next time DB A
replicates, it would then get the -3 from the server, and every one
would have the same value.

As an example, here's the text of my procedure:

CREATE PROCEDURE RPL$STOCK_MVT
(
SITE CHAR(1),
LIBELLE VARCHAR(100),
ARTICLE VARCHAR(10),
TYPE_MVT CHAR(1),
ENTREE INTEGER,
SORTIE INTEGER,
DDATE TIMESTAMP,
DOCUMENT VARCHAR(20),
LOGCODE INTEGER
)
AS
DECLARE VARIABLE PK1_VALUE INTEGER;
DECLARE VARIABLE EXEC_PROC_STATEMENT VARCHAR(200);
BEGIN
PK1_VALUE = GEN_ID(GEN_RPL$STOCK_MVT, 1);
EXEC_PROC_STATEMENT = 'EXECUTE PROCEDURE RPL$STOCK_MVT(' || '''' ||
:SITE || '''' || ', ' || '''' || :LIBELLE || '''' || ', ' || '''' ||
:ARTICLE || '''' || ', ' || '''' || :TYPE_MVT || '''' || ', ' || :ENTREE
|| ', ' || :SORTIE || ', ' || '''' || :DDATE || '''' || ', ' || '''' ||
:DOCUMENT || '''' || ', ' || :LOGCODE || ')';
EXECUTE PROCEDURE STOCK_MVT(:SITE, :LIBELLE, :ARTICLE, :TYPE_MVT,
:ENTREE, :SORTIE, :DDATE, :DOCUMENT, :LOGCODE);
EXECUTE PROCEDURE RPL$GENERATE_LOG ('STOCK_MVT', null, :PK1_VALUE,
null, null, null,
null, null, null, null, null, null, null, null, null, null, null,
null, 2250, EXEC_PROC_STATEMENT);
END

Another difference is that I handle synchronizing column values that are
attributed automatically (either through a generator or through a stored
procedure). That is, I provide a generator or stored procedure that I
should use to get the values. Then, on the client side, these fields get
temporary values. When the record gets replicated to the main server,
the column values get updated to be synchronous with the server (I get
the generator value from the server, and update it locally). I then go
on with the replication.

This is good for values that are attributed automatically, and need to
be synchronized. However, I found another solution to this same problem
for the case of detail tables that depend on some other master table.
For example, I have an invoice table, and another table (called
FACTURES_CORPS) with all the invoice lines. The FACTURES_CORPS table has
a field (called FACTURE) that references the main (FACTURES) table. The
primary key of this table is governed by a generator, and is never seen
by the end user. For cases like this, I made another solution. I choose
a field (in this case, FACTURE), and instead of replicating each line of
an invoice individually, I delete all the lines having a certain value
of the FACTURE field, and then reinsert them all. Whilst I reinsert
them, I take the primary key value on the fly from the generator of the
server, and don't update it locally. So every time one line in the
invoice is updated, they all get deleted and reinserted. This is a
perfect solution for this sort of situation, because it's simpler and
less time-consuming than updating it locally. Besides, it avoids the
potential problem of the replication being stuck because of trying to
update a record which the user is busy modifying.

Also, to avoid problems with dependancies, I attributed to each table a
priority, so that changes don't get replicated in the order they occured
in, but in priority order. This also solved several potential problems.

As you can imagine, it could get tedious to code this by hand, so I made
a little program to generate the metadata for me. I display a DBGrid,
with the list of my procedures, and I simply select the one(s) I want to
replicate, set the priority, and the rest gets done automatically. I do
the same thing for the tables. I can also configure which users I want,
and it automatically creates them. Also, I automatically grant
everything to PUBLIC, which simplfies things for me. Obviously, this
behaviour should probably be changed to grant full rights only to the
replicator users, as some people might not want to grant all to PUBLIC! :-)
I also use this program to store a lot of configuration information in
the registry.

A few more details:
1) I made it so that the replicator logs everything it does in a memo on
the sceen, and if ever something goes wrong, my customer can send me the
complete log by email by simply clicking on a button.
2) I also handle getting new versions of the application automatically
by FTP. I simply have a table in the database to which I add a line
every time there is a new version to fetch, telling me which file to
take, and which SQL script to take. That way, I can make sure the
metadata is synchronized as well.

Now for some other considerations.

I'm submitting this code to you all as such. I would love to see it
become FBReplicator, but I cannot be in charge of organizing it. I'm
counting on finding someone else willing and able to adapt this code to
his own situation or somebody else's. After all, I've done the hard
part, there are just a few little things that should perhaps be
straightened out before releasing to the general public.

One such thing is 3rd party libraries.
There are actually two programs (written in C++Builder 5): Replicator,
and Repl_Config. In these programs, I use FIBPlus for database access,
Raize (not much) for the interface, EhLib(for the excellent DBGrid), RX
and Systools for the Systray, FormState, as well as registry access. I
also use Indy for the SMTP and FTP access. The only difficulty would be
with FIBPlus, Raize, and EhLib, as the others are free. However, I
hardly use Raize at all, so it would be very easy to remove that. The
only difficulty in removing EhLib would be that I use a nice feature of
the TDBGridEh, which is to import and export itself. However, I think
there are some free components that can do the same sort of thing. So
the only one that involves a bit of work is FIBPlus. But even that isn't
so bad. I think the whole thing is under about 3000 lines of code. I
have only 3 forms, and about 30-40 queries (approx.). So this is a bit
of work, but it wouldn't involve changing that much code, nor anything
structural.

The other main thing I can see is the language. All the interface is in
French. It shouldn't be too bad, as there isn't an extensive amount of
interface, but it would still involve a bit of translating.

Also, there's one little limitation, that could perhaps be removed
(although I don't see any immediate need to do so), and that is that I
only support multi-column primary keys up to 3 columns.

These are the only things I can think of that could require a little
work before the product is ready to be released.

However, as I said, I cannot be counted on to do this work. I made the
replicator for my own needs (or rather for my customer's), and I didn't
do it during my free time; that's my job. As a result, I can't really
afford to maintain an opensource product during what little free time I
have; I usually prefer to do other things than programming! :-)

So, as I have already said, I'm hoping that someone will take interest,
and organize the project. If anyone is interested, please contact me,
and I'll send you the source code. If you want to see what the
executable's like, I could send you that too.

I plan to write an extensive help file, one for the end-user, and one
for the programmer, explaining in detail the inner workings of the
replicator. This will of course be in French, but it could be translated.

Jonathan Neve.