Subject Re: [IB-Architect] Backups of large database & super transactions
Author Jason Wharton

> Jason, please start with the requirements. Unless and until you can
> state what problem you're trying to solve it is impossible to judge
> whether any particular scheme is a satisfactory solution.

I suppose my style does make one have to read between the lines some. My

> I take it that you are unhappy both the time to back up a database
> containing many large and largely stable images and the time it takes
> to restore the database in face of a disaster.

Yes, these are two issues I have.

> Lets start with disaster. They come in natural (the disk crashes)
> and not so natural (somebody goofs and deletes something very important).
> The natural kind can be handled with a shadow set or a raid. The
> not so natural sort requires running time backwards.
> Since reversing polarity on the CPU clock, running time backwards
> is ackward, so we're pretty much precluded to finding a time machine
> and going forward until just before the goof. Now we get some
> choices.

Thanks for the entertainment.

> The original journalling mechanism produced a backup stream by
> copying database pages to the journal then journalling incremental
> changes. To minimize the size of the journal, index changes weren't
> journalled. A full, proper recovery was to run the journal forward,
> gbak the recovered database, then restore a clean copy. The assumptions
> behind the design were that disasters were assumed to be infrequent and
> that the eventual joy of recovery compensated for the relatively high
> cost of recovery. Or, to put it another way, we trade efficiency
> for normal operation for inefficiency of recovery. Another deficiency
> is that it was broken many years ago and is not worth "recovery."

AFAIK This is nothing like I am proposing. Please clear your mind of those

> There are lots of variations on the theme of spinning off a shadow
> and maintaining an incremental journal. The differences revolve
> around the relative costs of journalling vs. recovery. Full page
> changes are easy and robust, but volumulous. Incremental data
> changes are minimial to expensive to restore. Add incremental index
> changes and you get a happy medium -- volumulous journal and slow
> recovery.

I'm not talking about maintaining an external "log" of any kind. I'm
proposing that the existing versioning engine be modified such that the
garbage collection and transaction management system will efficiently
protect a "frozen" transaction and make it so that the changes since that
time can be discernable. I see this as a natural extension/enhancement of
the versioning engine we already enjoy. The "log" is actually woven through
the version tree of records. We would just need a way to extract and
re-apply them.

Agreed this isn't for every kind of database application... Triggers aren't
for everyone either. You chose to use them if you want to.

The backup would probably have to be changed significantly to allow for
this. The backup could probably no longer run as just another client,
instead it would have to some special sort of internal "awareness" to know
about what records are to be backed up and what ones are to be ignored.

What I think it boils down to is a mechanism to extrapolate a "log" from
record versions protected under a saved super transaction. Then, on the
restore side, it would have to take that log and apply it by creating
appropriate record versions.

I agree that things like Index(s) should be left out. It should only be
actual data that gets streamed out. I also think that any metadata changes
should "break" the ability to walk from one freeze point to another. If
someone needs to alter their table structures then the database should be
considered incompatible with the other structure. Thus, a DBA would
establish a new base after performing any structural changes. It is already
my rule of thumb to do a total backup and restore after altering a database

> If we were to add page version numbers to the basic page layout,
> we could do incremental shadow spinoffs, which would be the
> cat's meow to a large, reasonably stable databases.

Doesn't sound that interesting to me. This only solves one specific problem
rather than creating a foundation that lots of other creative things could
be done with. Sounds like more work to me too.

> But unless and until we agree on the requirements for disaster
> recovery, we can't evaluate possible alternatives.

This would probably spin us into deadlock and nothing would ultimately get
accomplished. You don't seem to be the agreeable type (nor am I). <g>

> So, Jason, in your application:
> 0. What type of disasters must you prepare against?

Hardware and misusage. Stray cosmic particles are an occasional concern of
mine too.

> 1. How often can you afford to perform a full backup?

No more than once a week, I easily imagine others at more of a monthly

> 2. After a disaster, how long can a recovery take?

Isn't the goal of a commercial product to be as short as possible?

> 3. Is it reasonable to assume a skilled DBA is available
> during recovery to resolve problems?

If there isn't one then they probably don't have a database over 500MB. Who

> 4. How much additional I/O can you tolerate to drive a journal?

This isn't about maintaining a journal.

> 5. How much disk (measured in multiples of database size) are
> you willing to dedicate to disaster recovery?

We would probably store the backups on tape. If we wanted the "on-deck"
database ready and waiting on another machine (to have the latest delta
applied to it before going live) then it would be no more than the same disk
space on the original machine.

As a commercial product I imaging keeping this as small as possible is an
important goal. My case most likely doesn't represent the average case in
the industry.

> 6. When recoverying from a journal, what information would a
> DBA logically require to find the stopping point?

Again, your biases are leaking in here.

> 7. What questions have I missed?

Don't know. I wasn't really digging for more questions.

Jason Wharton
CPS - Mesa AZ

----- Original Message -----
From: "Jim Starkey" <jas@...>
To: <>; <>
Sent: Friday, June 16, 2000 1:17 PM
Subject: Re: [IB-Architect] Backups of large database & super transactions

> At 11:36 AM 6/16/00 -0700, Jason Wharton wrote:
> >
> >> Like most hard problems, the place to start is to collect and
> >> prioritize requirements. When we have a handle on the problem,
> >> we can work on a solution.
> >
> >I know what the problem is and I am working on a solution. Just seems
> >people are more interested in shooting holes in my ideas that seeing how
> >they solve real problems... I'm out in the real world solving business
> >problems with InterBase on a daily basis... (not trying to imply anything
> >other than that)
> >
> Jim Starkey
> ------------------------------------------------------------------------
> Looking for Airfare deals?
> Visit for limited time offers
> ------------------------------------------------------------------------
> To unsubscribe from this group, send an email to: