firebird-architect - Re: [IB-Architect] Disaster Recovery Strategy Requirements (1st draft)

Subject	Re: [IB-Architect] Disaster Recovery Strategy Requirements (1st draft)
Author	Dalton Calford
Post date	2000-06-19T15:31:02Z

Hi Jim,

If you give me the time to finish what I am putting forward, you will
find it covers all the things you are discussing below. It works, it
works on databases of version 4 or greater. It does not need alot of
maintenance, and the period of possible data loss is user configurable
based upon thier environment.

While your statement is far more elequent than my little core dump -
please understand that what I am writting about works. It works on a
multi-homed system with ISDN connections between cities. It also is
fast, while allowing me time to spend with my family and write long
winded documents like this one.

This is a series of methods found to allow people to cover everything
you are writting about without needing any more releases from Interbase,
nor any new api's.

The reason I am putting it in step by step (or in our case, mistake by
mistake) description is so that other developers can see the problems as
we did and perhaps come up with different solutions.

Once you see what the final solution is, you will see that it can be
implemented with a series of very simple wizards and the maintenance is
straight forward.

regards

Dalton

Jim Starkey wrote:

>
> 1. Problem Statement
>
> A fundamental rationale for database systems is to maintain the
> integrity of data in an hostile and inherently unreliable world.
> Correspondingly, database management systems provide protection
> from transactional anomolies with a commit/rollback mechanism and
> from hardware and system (including correct but erroneous commands)
> through backup and restore systems, shadow volumes, and redundant
> disk systems.
>
> The original InterBase backup and restore utility, gbak, uses the
> published database API to create a self describing file containing
> both database meta-data and table images. Gbak is robust but expensive:
> Backup requires a complete pass over a database; restore creates a
> new database, populates it, and recreates all indexes.
>
> Database shadowing was added in version 3 to permit maintenance of
> a hot backup copy of a database. Shadowing is a good solution for
> protection against a disk failure, but offers no protection against
> either operational error (correct but erroneous commands) or progressive
> internal corruption.
>
> Large and very large databases are poorly served by either gbak or
> the shadowing mechanism. Very large databases are simply too big
> to gbak frequently and take too long to restore following a disaster.
> Shadowing is a best a partial solution, leaving a database vulnerable
> to progressive corruption and catestrophic operational error.
>
> Very large database systems are generally archival in nature and
> therefore relatively stable. Portions of the database may be
> volatile but the bulk of the database changes slowly and then
> generally by extension rather than modification. These
> characteristics provide opportunity for alternative disaster
> recovery strategies to gbak and shadowing.
>
> 2. Requirements for a Disaster Recovery Strategy
>
> A disaster recovery strategy for InterBase must:
>
> a. Protect against hardware (disk, controller, cpu, memory),
> operational error (correct but erroneous commands), and
> progressive internal corruption induced by either bugs
> or transient hardware failures.
>
> b. Allow a DBA to intelligently tradeoff post disaster recovery
> time against backup maintenence resource utilitization.
>
> c. Support recovery without loss of committed data following
> a complete hardware failure.
>
> d. Support recovery without loss of committed data to the
> point of onset of an operational error, software error, or
> corrupting transient hardware error.
>
> e. Support recovery without loss of committed data from any
> single detected point of failure.
>
> f. Support incremental backup (or equivalent) at a resource
> cost in proportion with the volatility of the database.
>
> g. Allow transfer of database images to suitable archival
> medium.
>
> h. Support 24x7 availability of databases.
>
> 3.0 Technology Discussion
>
> Clearly, some form of incremental backup facility is required. There
> are, happily, many alternative.
>
> Traditional operating system incrementation backups operate at file
> level ganularity to streaming devices to create discrete incremental
> backup files. Recovery is achieved by first restoring a complete
> backup file then applying, in order, incremental backups. The cost
> of complete recovery is relatively high (the systems doallow
> recovery of individual files). At the time these systems were
> designed, removable streaming media (tapes and the like) were
> vastly cheaper on a per byte basis than random access rotating media.
>
> Todays very large extremely cheap disks offer alternatives unavailable
> a decade ago. One of these is an incrementally updated backup image.
> In this scheme, an backup database image is periodically
> incrementally resynchronized with the active version. The cost of
> synchronization can be portional to the volatility of the database,
> so relatively frequent synchronizations are feasible. Following
> synchronization, the backup image can be copied to streaming media
> for archival purposes.
>
> A problem inherent in an incrementally updated backup image is that
> during synchonization the backup image is internally inconsistent
> and that a disk crash during synchronization could result in loss
> of both the primary database and the backup. To guarentee data
> integrity, either the backup image must itself be copied before
> sychronization or (better) two independent backups are used
> alternatively. The latter option has the effect in increasing
> disk requirements to three times the active database. A side
> effect of supporting multiple incrementally updated backup images
> is that "last incremental" information cannot be stored solely in
> the active database. There are obviously other solutions to this
> problem that need to be explored and discussed.
>
> Any disaster recovery system capable of recovery from operational
> errors must support either an undo or a redo log (some system
> support both). Although undo logs are probably better for recovery
> from operation error, they must be applied to a current database,
> making them all but useless for recovery from disk failure. Also,
> unlike the original InterBase journalling scheme, the redo log
> must contain index update records to avoid the need to recreate
> indexes following a roll-forward operation.
>
> Jim Starkey
>
> ------------------------------------------------------------------------
> Wrox Wireless Developer Conference, Amsterdam, July 10-12. Choose from
> 40+ technical sessions covering application of WAP, XML, ASP, Java and
> C++ to mobile computing. Get your ticket to the future today!
> http://click.egroups.com/1/5689/4/_/830676/_/961427109/
> ------------------------------------------------------------------------
>
> To unsubscribe from this group, send an email to:
> IB-Architect-unsubscribe@onelist.com