Subject Garbage Collection Question (WAS: Re: Slow SELECT/DELETE)
Author Adam
--- In, "federonline"
<federonline@...> wrote:
> Hi, Adam.
> This brings up a couple issues that relate to our project. We use FB
> 2.0 and our DB is less than 300M. My question is mostly about
> terminology...but some background first....
> We have an application that collects data asynchronously and in real
> time, storing it in a table. The relevance of that data diminishes
> over time and is generally irrelevant after 12 hours. Every 5-10
> minutes, our application launches a thread that purges everything
> older than 2 days (1200-2000 rows, depending on input devices), which
> keeps the DB small.
> My questions are:
> 1) do you use "garbage collection" to mean re-indexing?

No. Firebird uses a technique known as MGA or MVCC to achieve
transaction isolation. In short, when you modify a record, instead of
overwriting the original, if writes the delta (changed fields) along
with your transaction number. Depending on whether your transaction
number is committed, and the type of transaction someone else uses (eg
snapshot or read committed), your changes may be ignored when a record
is read by older transactions.

Eventually, when the database engine realises there is no transaction
that could be used

This is a great benefit for supporting simultaneous transactions,
because it means you can avoid locking when reading, and you don't
have to resort to hacks like dirty read to keep your system usable
during long reports.

More info:

> 2) can you tell me the affect of the constant purging of data on the
> index??

Probably not as serious as you worry, providing the 'shape' of the
data doesn't change too drastically causing the statistics to be

> 3) if the affects are very negative, is there a way programatically
> (through the API) to force an index rebuild or "garbage collection" if
> those terms are almost synonymous?

No they are not synonymous.

To rebuild an index, disable and re-enable it through the standard SQL
syntax 'alter index'. If you are the only user, and you are doing a
significant number of bulk deletes etc, it doesn't really make sense
to maintain the index for each record deleted. It is far cheaper to
throw away the index, delete your records, then build a new one.

Garbage collection will happen automatically whenever the engine tells
it to. If you have committed your changes (not commit retaining, a
proper commit), and no older transaction could possibly be interested
in that version. If you happen to be the unlucky one to stumble upon
the bunch of deleted records after the engine realises it is garbage,
you get to clean it up (in cooperative mode).

If you are the only user, then simply commit, then run

select count(*) from problemtable;

This will then clean up the garbage as it happens.

> I have to say, generally we have no issues, but have also only let the
> system run for 2-3 months before bringing it down, running a backup,
> running gfix and a sweep (none of which have ever presented an error).
> I'm just nervous about deploying it for a year without anyone to
> service it.

The worst that is going to happen is that performance will drop, and
this can be cleaned up with a backup-restore cycle. But proper
transaction handling (not leaving transactions running longer than
necessary) goes a long way to minimising this problem.