Subject Re: [ib-support] Failover Strategies
Author Paul Schmidt
On 14 Dec 2001, at 10:56, John Peterson wrote:

> I wish to have an application fail over automatically to a backup
> server in the event that the primary fails (power fail, IB failing,
> LAN failing etc.), specifically to know if the primary is operating.
>
> The application uses IBO, and DML Caching, allowing the server to
> "push" table changes out to the clients without the client continually
> polling the server.
>
> I would be interested to hear how people implement a low cost "ping"
> request to the server. A simple query, or call to a SP may be best,
> since it would test all required systems.
>

If your using super-server, it should automatically restart itself, so
the question becomes what happens if the server dies at the O/S
level, such as the O/S crashing (Windows) or a hardware failure
(Windows or Unix). There should be various failover systems, not
sure about PC/Linux stuff, I am sure Microsoft has a $50,000
sollution, that works half the time, and means you need to upgrade
to W2K Enterprise.. Ssome of the main-line Unixes have
automated failover systems, although you could probably fake it, in
software.

The primary server could write a heart-beat file, it needs to write
this on the secondary servers file system, it writes the file every 15
seconds, the secondary server then checks the file every 15
seconds, if it's more then 60 seconds old, then it assumes the
primary has failed, and updates DNS to point the primaries name
to itself, and restarts DNS. Workstations only need to know that if
they lose their connection to reconnect. Of course you need some
form of replication so that the secondary and primary servers are
kept up to date with each other.

Network problems are the biggest pain, and the bigger the network,
the bigger the pain, because it could be as simple as one
workstation affected, because someone drilled a hole in a network
cable, or half the network, because a router packed up. You may
want to design the database, so that the servers can run
independant of each other, as well as together. For example,
records that need unique keys, create unique keys across both
servers, then put the servers on different network segments. So if
a router fails, the servers lose their connection with one another,
and can capture data independantly, that you can then merge
together the data when the connection is working again. You may
want to have printed reports, show that it's running in "degraded
mode" when the servers are not talking to each other, but are both
operational.

Paul






Paul Schmidt
Tricat Technologies
paul@...
www.tricattechnologies.com