firebird-architect - Full and Differential Backup

Subject	Full and Differential Backup
Author	Olivier Mascia
Post date	2000-06-23T12:05:21Z

Dear all,

That's really a 'blind' proposal here, I'll have to learn much from the soon
to be available source-code to analyze the potential difficulties of
implementation. Let's consider the following as a rough proposal of backup
method. It would most probably imply a ODS change, though minor. And some
engine modifications, maybe major.

I have not given a thought to a possible implementation in Classic Server.
I'm biased in favor of SuperServer architecture. I implemented a similar
scheme on another database server (not SQL oriented) years ago. I am
basically re-assembling here some ideas and details, because detailed
technical documentation was lost and only C/C++ code is still available.

1. The backup processing is a server issue. The client is not concerned by
the operation, except for starting it, eventually reporting its progress,
stopping (cancelling it) and reporting its final status when finished.

2. The backup processing is exclusive. There can only be *one* and only
*one* backup in progress per database.

3. The backup processing requires a separate processing thread. And
synchronization of this thread actions with all engine actions that modify
pages. To be considered modified ("touched") a page does not need to be
rewritten to disk (cache). So the synchronizations needed will have to occur
where logical modifications to pages are done. That backup thread only
exists when a backup is in progress.

4. The whole process requires to be able to identify unambiguously any page
at any time, and requires to be able to "know" if some page is or is not
before or after another one, sequentially speaking in the physical files.

5. If implementation is feasible, this backup technique works at the
physical level (page level) and gives you a backup file which can (in the
non-differential mode of operation) be restored later to give you a complete
image of the entire database, -- at the time the backup finished -- and not
at the time the backup started.

6. Here the basic idea. The backup is a sequential scan page by page of each
pages of the main database file, followed by each secondary file.
Essentially, the backup output (which may optionally be more or less
compressed on the fly) is a copy of each pages in the physical order in
which they are read. When the backup is about to read a page, all engine
write operations to that page must be able to be suspended (here is the
synchronization issue introduced above). The page is read (from disk or
memory cache), depending where the "current" instance of the physical page
is. And appended to the backup file. The writes operations of the engine to
that page (which were suspended) are resumed. The backup thread keeps a
current page info : the "highest" page that has been copied to the backup.
The engine internal operations have to be modified to check
(synchronization again) that "highest" backuped page before any attempt to
update any page. The key idea is this : if the about to modified page HAS
NOT yet been processed by the backup thread, green light, the engine may
proceed normally. If the page HAS ALREADY been backed up, the page
identification (can we speak of a page number ?) is appended to a log. That
log may be in RAM or in a sequential append disk file. And the engine update
the page.

7. Where does this lead to ? Pages are quickly read, sequentially. They are
quickly written sequentially. While the copy is in progress, the engine is
free to update any page not yet written. And forced to log somewhere the
address of every
page modified if they had already been copied. When the full scan of
the entire database is done, what do we have ? A copy of each pages, but not
coherent. Each page has been copied when "quiet" (no physical update in
progress). But the whole thing is not coherent, because pages copied where
already modified after the copy and before the backup went to the end of
this first pass. If the engine was completely off during this pass (database
not opened for instance), we would have a coherent binary copy (like an OS
copy) of the database.

8. Now for pass two. During this phase, any page going to be touched by the
engine (existing pages, or new pages added at the end of the database file)
has to be logged in append to the log (not the page content, just its
address). The backup thread now starts processing the log sequentially from
the beginning to its end, while the engine eventually and probably continues
to append to the log. There are very sensible multi-thread synchronization
issues here again, but nothing impossible to realize. The backup thread gets
a page number from the log, reads the page (again blocking any engine writes
to that one while the read is ongoing), and writes it in append to the
backup file ALONG WITH the address info of that page. This competitive
process continues until the backup thread WINS against the engine updates.
That's the tricky part of the process, yes, but provisions can be taken to
assure that at one time or another, the backup will win its race against the
engine. As soon as the last logged page number is backuped, the backup
process may be declared finished, and normal operation of the engine
resumed.

9. To restore such a backup, the restore utility reads sequentially the
pages stored during phase1 and reconstruct (sequential process) the database
files. The restore continues by reading sequentially each page written
during phase2, and because of the address info saved at the same time as the
page content itself, re-writes that page at the right place. Yes, a given
page may have been backuped many times during the process. The restore has
not to make intelligent decisions. It takes a page, writes it where it needs
to be written, and process the next in the backup file.

OK, this is a brute explanation. But I suppose everybody now gets the
general picture.

Without anything more, this backup has at least the same disadvantage as
gbak. It copies everything (even the indexes, and some or many pages may
even be backuped many times during the process if they are "hot" pages) and
this may be a pain on huge databases. But it already has a big advantage.
Speed. And with a few small ideas, you can add a "differential" (at the page
level, not the record level) mode to this process.

There is no original ideas in this. And more knowledgeable people would
surely have expressed this in thousands words less than me. My question :
is such a backup scheme something that may be implemented ? It all depends
on many deep internal details of the engine. So it may totally be
unapplicable.

In the current implementation for that other kind of database system, we
have a small optimisation. There is a concept of page "zero" in each
physical files of the DB that may well be updated thousand times during the
course of the backup. The only "optimized" thing done is to NOT backup those
zero pages during phase one. Consider them as implicitly ever touched. And
append them at the end.

With Interbase and MGA this also implies that the restored database will be
in the instant state it had at time the backup ended. With potentially a lot
of transactions started but not ended at that time. So all the scheme would
be defeated if the server is not able to recognize all those transactions as
abandonned (limbo transactions ?) at restart of the server on the freshly
restored database.

This also implies a certain loss of work (due to those abandonned
transactions). A loss which would not be huge in database application well
designed (that keeps transactions actives for as few time as possible). An
application that would keep transactions active for extended periods of
time, would risk to lose many data with such a backup. But I consider this
is a trade-off between being able to get a quick backup, while the DB is
totally live, no restrictions imposed.

10. For differential backups, one could imagine to have a backup counter in
each page. And a "last backup done" number in a main header page for the
database. After a full backup, the idea would be to increase the global
counter to the highest backup number encountered in any of pages. So
basically we know after a full backup that the last time a full backup was
done, we backed up all pages marked with that number or a smaller one.

11. During normal processing, the engine simply updates each page counter to
that global value + 1. A differential version of that backup would proceed
like described above, but would skip over any page whose counter is less or
equal to the current master counter. So we only copy and take care of any
page modified AFTER the full backup happened. Restoring would imply
restoring like discussed above a full backup, then restore the differential
stream, page by page thanks to the address info of each page.

Why a backup counter instead of a simple bit to say "modified since last
full backup" ? Well, it offers a kind of traceability. You can ask the
system to do a differential backup since any previous full backup in the
past. That may (or may not) be usefull under certain circumstances. The full
backup would also contain its "number" in the backup file. That would allow
runtime checks to NOT allow restore of differential backup (from level 7 for
instance) on top of the full backup 5 (missing updates in between of
course).

12. There are side details to be backuped : number of files in the database,
their names, and maybe still some more info. The physical format of the
backup file needs headers to keep this information. The notion of "page
address" (a file, and a page inside that file) needs to be conveniently
defined. And so on...

Despite the tricky issues involved in such a backup scheme, advantages are
speed of the process, (even if the engine performance will be lowered while
the backup is in progress, off-course, due to the thread synchronization
issues), the output is sequential and can be compressed on-the-fly. And it
may allow for a page differential way of backing up.

My main goal here is to make sure someone will at least verify (or help me
verify when I'll be able to look at the code) if this backup scheme has any
chances of being implementable without too much hacking in the current
engine code.

This is certainly not a perfect disaster recovery scheme.

Questions, comments, and destroy of this idea are welcome !

---------------------------------------------------------------------
Olivier Mascia T.I.P. Group SA
om@... www.tipgroup.com
Director, Chief Software Architect +32 65 401111