firebird-support - Re: [firebird-support] Replicator pseudo-code (sorry, it's a little long...)

Subject	Re: [firebird-support] Replicator pseudo-code (sorry, it's a little long...)
Author	Jonathan Neve
Post date	2004-01-22T09:09:10Z

Hi!

Just a few comments. I guess the approach you suggest is viable, since
everyone else says it is, but to me it seems unduely complicated.
I've just written my own replicator, so I have thought about this
subject quite a bit recently.

For one thing, I think the idea of having one log table for each real
table is not very flexible. For example, that means that every time you
add a field to one of your tables, you have to remember to add it to the
corresponding log table on each database. This sort of thing seems
rather error-prone to me.
Also, this structure is going to take up a lot of room in your database.

I think using one central log table is much better. This means,
obviously, that you won't have all the fields of each table in the log
table, but only the primary key(s). I guess this could be seen as a
disadvantage, but in practice, I don't see what problems it could bring.

Another thing is that I found in my experience that having a log
containing each change that was made to the database, indicating whether
it was an insert, a delete or an update, and then trying to blindly
perform the same operation on the remote database, is not a flexible
structure at all. I had started out like this, but I found that it
brought many problems. I know it's possible, but it's not flexible.

Consider the following examples :

- Suppose someone creates a record, realises that it was a mistake,
and deletes it right away. In the log you would have two lines. The
first would say that the record was inserted, and the second would say
that it was deleted. So the replicator trys to create an insert
statement for a record that doesn't exist any more. But it can't because
in order to make the statement, it needs the values from all the fields.
So this fails.

- Suppose a primary key value gets changed. The same problem as above
occurs. You have a line saying to insert a certain record, but by the
time you try to create that SQL statement, that record no longer has the
same primary key value, and so you can't find it. At first I thought I
could get around this problem by using another field called PKOldValue,
in which I would put the old primary key value every time it gets
updated. But this doesn't work either, because the initial insert still
couldn't take place.

- Suppose you have a trigger on one of your tables (Table A) that
inserts a record into another table (Table B). So you insert a record
into table A, and a record gets automatically inserted into table B. In
the log, you therefore get two lines, one for each table. But when you
replicate table A, the insert gets done automatically into table B on
the other server. When you then subsequently replicate table B, the
insert fails because it has already taken place.

I think some of these problems might be less significant if you log all
the data changes rather than just the primary key. However, I think
there is a better solution than that.

So here's a brief description of what my replicator :

My replicator is very simple : each database has triggers in all tables
that need replicating, so as to log to the central log table the primary
key values of all records that get changed. In the log table, there is
no indication as the type of change (insert/update/delete). The log
table also contains the priority of the table. This priority is set
using a configuration program (which also creates the triggers
automatically), for each table needing replication. Thus you determine
the priority of your tables based on their dependancies. Each change is
logged once for each remote site.

Then I have a program, written in C++Builder, which runs on each remote
site. There is no software on the main server (which was important in my
case, since our's is a Linux server). This program, the replicator,
connects to the main server, queries the log table, and retrieves all
changes that apply to the remote site it's running on., ordering by
priority, and grouping up all identical records (all changes for the
same records), using a "select distinct ...". Then I select that record
from the remote database. If it's not there, then I issue a delete on my
local database. If it is, then I try to select the same record from my
local database. If it's not there, then I perform an insert, otherwise,
an update. This is quite a good optimisation, since only the newest
version of the record will get taken. If a lot of changes were made to
the record since the last replication, it will still only take one
insert statement. Basically, rather than blindly apply an log of
changes, I compare the two databases, so as to make them then same. This
practically cannot go wrong. Logically, it should be impossible for
there to be a primary key violation. In the worst case, if there are two
conflicting changes, the replicator will simply overwrite one of the
two. Normally, you should write you application in such a way as to
avoid such problems, but even if you don't, the replication process will
go smothly.

Of course, there can sometimes be changes that don't go through for some
reason (for example, if the local record that the replicator is trying
to update is currently being modified). In such cases, I simply put a
copy of the record into a sepate table for conflicts. However, I also
keep the original record in the log table. Thus, the conflicting update
will be attempted every time. Every time, before replicating, I clear
the conflict log, so that if the conflict goes away the second time,
there is no trace of it any more.

Then the same process is repeated in the other direction.

This process works smoothly with as many remote site as you like.
Currently, I only have two remote sites, with one main server, but
structurally, there would be nothing to change if we were to add more.

I have a table containing the list of all replication users. In my
triggers, I simply generate one line per user, other than the user
that's conntected. Thus, the replicator connects as REPL, or some such
user, and all changes that it applies to the local database from the
server are duplicated for all users other than itself, if there are any.
Thus, it's perfectely possible to have serveral alternative database
servers which several clients would replicate to, and which would in
turn replicate to the main database server. In my case, I don't do this
though (I don't have that many clients). I simply have two remote client
machines, one of which has one other machine connected to it directly
through a local network.

I find this solution to be very fast, and very flexible. I have also
integrated a system for fetching new versions of my application, as well
as the SQL metadata changes that go with it (if any), as well as a means
to send me by email a detailed log of everything that happenned, along
with the contents of the conflict table, if ever there is a problem.

My system has been in place for about two months, and it's working out
quite well. If you're interrested I can send you the C++Builder source code.

Of course, all this is merely the fruit of my much time spent thinking
about this, and implementing it. You ideas might well be better in your
circumstances.

I hope this helps you with your project,

Jonathan Neve.