Subject Re: [IB-Architect] Full and Differential Backup
Author Jim Starkey
At 02:05 PM 6/23/00 +0200, Olivier Mascia wrote:
>That's really a 'blind' proposal here, I'll have to learn much from the soon
>to be available source-code to analyze the potential difficulties of
>implementation. Let's consider the following as a rough proposal of backup
>method. It would most probably imply a ODS change, though minor. And some
>engine modifications, maybe major.

Not really. What you are describing is the mechanism the engine
used to dump a database to a journal file when creating a journal.
A close relative (and probably an even better candidate) is the
shadow create code.

>2. The backup processing is exclusive. There can only be *one* and only
>*one* backup in progress per database.

I don't think this restriction is actually necessary.

>3. The backup processing requires a separate processing thread. And
>synchronization of this thread actions with all engine actions that modify
>pages. To be considered modified ("touched") a page does not need to be
>rewritten to disk (cache). So the synchronizations needed will have to occur
>where logical modifications to pages are done. That backup thread only
>exists when a backup is in progress.

By design, disk page writes are scheduled so the database on disk
is always valid and consistent. This is not true of dirty pages
in cache. You are much safe considering page changes only the
the engine would otherwise be writing a page to disk.

>6. Here the basic idea. The backup is a sequential scan page by page of each
>pages of the main database file, followed by each secondary file.
>Essentially, the backup output (which may optionally be more or less
>compressed on the fly) is a copy of each pages in the physical order in
>which they are read. When the backup is about to read a page, all engine
>write operations to that page must be able to be suspended (here is the
>synchronization issue introduced above). The page is read (from disk or
>memory cache), depending where the "current" instance of the physical page
>is. And appended to the backup file. The writes operations of the engine to
>that page (which were suspended) are resumed. The backup thread keeps a
>current page info : the "highest" page that has been copied to the backup.
>The engine internal operations have to be modified to check
>(synchronization again) that "highest" backuped page before any attempt to
>update any page. The key idea is this : if the about to modified page HAS
>NOT yet been processed by the backup thread, green light, the engine may
>proceed normally. If the page HAS ALREADY been backed up, the page
>identification (can we speak of a page number ?) is appended to a log. That
>log may be in RAM or in a sequential append disk file. And the engine update
>the page.

May I make two suggestions to simplify this?

First, relax the requirement that pages be written in order. The
engine is always free write a page above the backup high water mark.
If if wants to write a page already copied to the backup, just write
the new page to the file. Simple and works like a charm. Both
the journalling code and the snapshot code use this hack.

Second, if you are willing to assume the backup file is a disk file
rather than a streaming device, create a page by page image of
the database file -- secondary writes (previous paragraph) just
overwrite previous writes. This has the added benefits of a)
reducing the size of the backup file and b) eliminating the restore
step. Also notice that this is the existing shadow creation code.
Other than ripping out the check for non-local files and adding
a clean mechanism to detact the shadow, the work is done.

>8. Now for pass two. During this phase, any page going to be touched by the
>engine (existing pages, or new pages added at the end of the database file)
>has to be logged in append to the log (not the page content, just its

Skip this. If you add pages out of order to the backup (serial or
random access), this pass isn't needed.

>9. To restore such a backup, the restore utility reads sequentially the
>pages stored during phase1 and reconstruct (sequential process) the database

Hey, we can skip this too. We're on a role. Take the detached shadow,
call it a backup file. Move it to your tape farm. When the bad
day comes, the restore utility is the copy command.

>OK, this is a brute explanation. But I suppose everybody now gets the
>general picture.
>There is no original ideas in this. And more knowledgeable people would
>surely have expressed this in thousands words less than me. My question :
>is such a backup scheme something that may be implemented ? It all depends
>on many deep internal details of the engine. So it may totally be

You have been down a path traversed by many others. That's good. It
means you're not lost. Keep going and we're get to the next round
of opportunities, problems, and solutions.

>In the current implementation for that other kind of database system, we
>have a small optimisation. There is a concept of page "zero" in each
>physical files of the DB that may well be updated thousand times during the
>course of the backup. The only "optimized" thing done is to NOT backup those
>zero pages during phase one. Consider them as implicitly ever touched. And
>append them at the end.

Good hack. Probably not desirable for the shadow case, but when writing
to a streaming device, a great win.

>With Interbase and MGA this also implies that the restored database will be
>in the instant state it had at time the backup ended. With potentially a lot
>of transactions started but not ended at that time. So all the scheme would
>be defeated if the server is not able to recognize all those transactions as
>abandonned (limbo transactions ?) at restart of the server on the freshly
>restored database.

No, this scheme preserves transactions in limbo just fine (rule one
of limbo is that everything is safe on oxide before TIP is changed to
limbo). Transactions in progress will be declared dead when the
server first attaches the backup image.

>This also implies a certain loss of work (due to those abandonned

A fixed backup has to stop someplace. The next part of the exercise
is to design a "redo" log so nothing ever gets lost assuming no more
than a single point of failure.

>10. For differential backups, one could imagine to have a backup counter in
>each page. And a "last backup done" number in a main header page for the
>database. After a full backup, the idea would be to increase the global
>counter to the highest backup number encountered in any of pages. So
>basically we know after a full backup that the last time a full backup was
>done, we backed up all pages marked with that number or a smaller one.

Bingo. Give the man a kupee doll. But rather than using it as an
incremental backup counter, use the same word as a page revision
number. If we tweak the database to maintain a separate (pages? file?)
set of current page numbers, a backup can be synchronized with the
active database by comparing record version numbers. There are
some trickinesses here that can be fun to work out. The problems are
maintaining a satisfactory level of reliability (or figure a way to
cope with unreliability) without taking a performance hit.

>12. There are side details to be backuped : number of files in the database,
>their names, and maybe still some more info. The physical format of the
>backup file needs headers to keep this information. The notion of "page
>address" (a file, and a page inside that file) needs to be conveniently
>defined. And so on...

There is a school of thought that suggests that depending on information
stored in a object to restore the object in the face of disaster has
certain disadvantages... On the other hand, doing the accounting for
a poor mortal can determine his exposure should the computer gods have
an off day is very important.

>My main goal here is to make sure someone will at least verify (or help me
>verify when I'll be able to look at the code) if this backup scheme has any
>chances of being implementable without too much hacking in the current
>engine code.
>This is certainly not a perfect disaster recovery scheme.

No, but an excellent start.

Jim Starkey