Subject Re: [ib-support] Synchronising a multi-site database
Author Paul Schmidt
On January 28, 2003 09:52 am, "tickerboo2002 wrote:
> Hello
>
> I'm developing an application that will be used in five different
> locations and each night the databases should be synchronised to
> contain identical data ready for work the following day.
>
> At present, I'm tending towards having all the satellite offices
> sending the data to the head office, where the data gets synchronised
> and then sent back to each office. How do people do this - dial up
> using a modem / as an email attachment(!) / posting to a website?
> Can a process connect to another Firebird server via dialup and fire
> SQL over the line to get any changes?

Your asking for trade secrets here, well maybe. If I were doing this I would
probably get each branch to bundle up it's records, then FTP them to the main
office, each office would use it's own identifier plus the date to make
a filename. The main server would run another app that detects the files
coming in, and applies them to the main database, and then at a specific
time, the main office would bundle up all of the changes (for all of the
branches) and then create output files, each office would then connect and
download their changes and apply those to their own database.

> I think I need to give each row a branch ID and possibly
> a 'LastChanged' timestamp, so that only changed records can be sent
> to the headoffice, but what about deleted rows and rows where
> multiple offices have made changes to the same row? Or, is it best
> to have duplicate tables which initially start the day empty, and
> gradually fill up during the day (via triggers) with changed entries?

Two ways I can think of, the easy way and the in some ways hard way.

The easy way.

I would give each record an "owner", that office is then responsible for
keeping that record up to date. For example if the Toronto Office is the
owner for the customer record for General Motors, then the Montreal Office
could not update the General Motors customer record. This saves you having
one set of changes "walk over" another set of changes. Simply SELECT all
records changed since the last update, and send those to the master, it
updates it's copy, and then sends all changes back out to the branches. Each
branch would not recieve back it's own changes.

The in some ways hard way

You use your triggers to build a journal table, this tracks the time, table,
primary key, field changed and the new contents of the field. If you change
147 fields, you end up with 147 journal entries. Your daily send to the
master is simply a dump of the journal table. Which is then emptied.

The master loads the journals into one big table, then sorts them by table,
primary key and timestamp, then applies the changes, to the master database,
since they are applied in timestamp order you know that at least if one
office overwrites a change by another office, at least they are sequential to
when the update was actually made, of course the times must all be in the
same time zone. Once a record is completely updated, a copy is put into a
file (this could be an SQL UPDATE statement), which will be sent to all of
the offices later on. When the branch offices receive the update file (they
FTP it from the server), they overwrite all changes that they have made, with
the contents of the file. Then all of the offices have essentially, at least
for a few seconds, synchronized databases.

The head office would maintain TWO copies of the database, one that is local,
where they make their changes, which get journalled like everyone else. The
second copy is used for the journalled updates, so nobody actually uses this
one except the journal updating program.


Deletes

You can't delete a delete, just flag it as deleted, using a status field,
otherwise you can't transmit the delete to HO or anywhere else. The
advantage of this is that you can add an un-delete feature, by flipping the
flag back to clear. Then write a program that goes through and deletes the
records marked as delete, which you might run every month, or even annually,
the most important thing is that you run it AFTER a complete database backup.