Subject Re: GBAK hangs with GC
Author Adam
> What's a backup's use if never being restored ;-)

Well most of us hope that we never need to restore our backups. The
purpose of most backups is that should their be a catastrophic
failure, we haven't lost everything. If I never have to restore a
backup file in a production system, then that is good news.

GBak performing garbage collection is usually beneficial. It means
that the garbage is removed from the database regularly without having
to schedule an explicit sweep process. It means that (usually) the
garbage collection tends to happen in off peak times, and garbage that
can build up in tables that are rarely touched gets addressed.

There is also the possibility of wanting to backup then restore
immediately. This shrinks the database, rebuilds indices, resets the
transaction counters etc. In this case, there is no point in cleaning
up the production database as it goes. The thing is about to be
deleted and the backup file doesn't even record the garbage records,
so in this case there would be no point waiting for this process.

In answer to your original question, Firebird 2 introduces a new index
structure that makes it much cheaper to clean up an index with lots of
duplicate records, so in that capacity, garbage collection will be
faster, possibly even a non issue.

>
> But seriously, what is the suggested way to do a massive delete on a
> (fairly large = 8 GB) database?
> Obviously you have to do something because it might corrupt the
> database otherwise.
> One of our customers corrupted their database twice. And each time
> they did a massive delete before without a backup/restore.
>

You seem a little confused or maybe you are using the wrong word.
Where is the corruption coming from? Are you maybe using more rows
than the number of internal record numbers available.

Totally removing the data is a 2 phase process. Firstly, your
transaction must flag all the records as deleted then commit. Then at
some point in time in the future when no transaction could possibly be
interested in the deleted record versions, garbage collection will
happen. Gbak by default will do this garbage collection.

> So that's
> what I suggested to do, a backup without GC and a restore afterwards.

Precisely. They were wasting time backing up with GC if they had no
intention of continuing to use the database they were backing up. As
they were creating a new database, it makes perfect sense to use the
-g switch.

Adam