firebird-support - Replicator pseudo-code (sorry, it's a little long...)

Subject	Replicator pseudo-code (sorry, it's a little long...)
Author	Nigel Weeks
Post date	2004-01-21T05:52:22Z

A little while ago, the topic was raised of using combination primary keys
with integers to help with replication.

If anyone has time, could they look at this pseudo-code, and give any
feelings/concerns/etc...?
It's a central replication server to copy
data from sourceDB's to destDB's, using a paradigm of a central Inbox/Outbox

Each database that mod_repl hooks into MUST HAVE
int_siteid INTEGER NOT NULL,
dtm_rstamp TIMESTAMP DEFAULT 'now',
...
in every table.

This is so data that was created at one site, can be copied
to another site, without any(less, hopefully) primary key violations(can
fall back to dtm_rstamp to do ordering).

Change log tables have to be produced for each table in the database,
Each log table has the all the fields of the table, and the GMT of the
change(timezone independent)

The Replicator server then has a set of Inbox and Outbox tables that match
the databases' tables, including all data.

------ Begin Pseudo ------

Replication Method
When an insert, update, or delete happens on the local db with local data,
logs are written into repl_<tablename> as O_field, N_field, etc.
These logs accumulate indefinitely, until the replicator comes in.

The 'Collector' queries each database, and retrieves any new changes. It
copies the changes to it's local 'Inbox' for that table, and, if copy was
successful, deletes from the sourceDB's log tables.

A process then runs on the 'Inbox' table , and for each record, populates
the 'Outbox' table with a matching record, for every database server EXCEPT
the one where the event came from in the first place(it generated the event,
so it already has it).
If this process is successful, the 'Inbox' will be empty, and the 'Outbox'
will have changes ready to send out to other servers.

A process then queries the 'Outbox', connects to servers, and performs
updates. If changes are successful, the record in the 'Outbox' is deleted,
and the transaction is committed.

Rundown: 3 database servers
Change to S1 - addr.name changed from 'Eric' to 'Phil'
Trigger logs change into repl_addr: o_name='Eric', n_name='Phil'

Collector:
For Each Source DataBase
Connect to DataBase
For each table
Scans changelogs
For each changelog line
Finds record:o_name='Eric', n_name='Phil'
Copy Changelog to Inbox_tablename:S1(sourcedb):o_name='Eric',
n_name='Phil'
if Copy to local:repl_addr worked
Delete the changelog from S1
else
Copy Changelog, sourceDB, 'Collector Problem' to failure table
Loop for each changelog line
Loop for each table
Disconnect

Processor:
Loop for Each Inbox Table
Join to setup table, and insert rows into 'Outbox' where int_siteid !=
SourceSiteID
If insert successful for all sites, delete from 'Inbox'
End of loop for records
End of loop for REPL.tables

Courier:
For Each DestDB(order by destdb,dest.table,dest.timestamp)
Connect to DB
if connect failed
skip to next database
else
For each repl.<tablename>
Start Transaction
Set error to 0;
If Changelog is insert
Check to see if record exists already
if record exists
Set error to 10;
else
Insert new record
if Insert failed
Set error to 20;

If Changelog is Update
Check to see if record exists already
if !exists
Set error to 30;
else
Update record
if Update Failed
Set error to 40;

if Changelog is delete
Check to see if record exists already
if !exists
Set error to 50;
else
Delete record
if delete failed(record still exists)
Set error to 60;

if error > 0
Copy Changelog, SourceDB, DestDB, Error code to failure table
else
Delete changelog from repl.<tablename>
//(Outside of DestDB Transaction context)
Commit Transaction (Do Small Changes - Saves Deadlocking people out)
Loop For each Changelog Line
Loop For Each repl.table
End of if for Connect Success
Loop for each DestDB

# Test scenarios:
One server goes down, and people use it's data on the backup servers

On site 1(S1), data for S2 is updated, as S2 is down

Physical Site | Data Site | PKEY | Data | Action
S1 2 1 freb
S1 2 1 Fred
Update
trigger inserts 2:1:old=freb,new=Fred into REPL table

Collector:
Finds 2:1:old=freb,new=Fred from site S1. Writes it into Inbox

Processor:
Moves data that came from Physical Site S1, into Outbox for sites S2, S3
Move was successful, so 'Inbox' record is deleted

Courier:
Finds records to go to sites S2, S3 (As it physically came from S1)
Connects to S2: site is down.
Because transfer failed, record is not deleted
Connects to S3: Success
Record transferred. Success is true, 'Outbox' record for S3 is deleted
(Record for S2 remains, and is attempted every time the 'Courier' runs)

S2 comes back on line
Courier:
Connect to S2 succeeds. Data is transferred(may take a while)
All data queued for this site from other physical sites is transferred, and
'Outbox' is cleared completely.

------- End of pseudo --------

Any feelings? If you do reply, PLEASE cull back the code(list traffic,
etc..)!

Nige.