firebird-support - Re: [firebird-support] How does sweeping works?

Subject	Re: [firebird-support] How does sweeping works?
Author	Ann W. Harrison
Post date	2009-08-27T14:24:38Z

Daniel Vogelbacher wrote:

> On [Thu, 27.08.2009 12:51], Dimitry Sibiryakov wrote:

>>> If I run a manual sweep every night, the select query
>>> should run fast, isn't it?
>> No. Speed of select isn't so much related to sweep or garbage collection.
>
>
> Uhmm, now I'm totally confused.

OK. Things that can affect the speed of a query include
the use of indexes, the number of irrelevant rows that must
be read, and the cost of getting rid of obsolete records
and index entries.

If you use a selective index and don't read any records that
have obsolete versions that need to be garbage collected (or
if you've chosen the thread-based garbage collector) the
presence of obsolete records doesn't matter, generally. It
might matter slightly if there were huge numbers of obsolete
records that made the index deeper than normal.

If your database is full of old versions of records and
you don't use an index, or if the index is not selective,
you look at lots of irrelevant records even if your transaction
doesn't garbage collect them. That will slow your query
slightly.

If the database is full of old versions of records that your
transaction reads and must garbage collect, then you query is
going to be very slow. If your query will touch most of the
obsolete records in the database, it will be nearly as slow
as running a sweep first, then running the query.

>
> The query is slow, as far there is only one (active) record.
> While count() runs, GC cleans up data & index (or not?).

Yes, if there is anything to clean up - meaning both that there
are old record versions and index entries and that those old
versions are not relevant to any running query.

> Because the data (which slows down the query)
> is generated by a rolled-back transaction,
> I think I need to run a sweep before executing the SELECT
> statement (or not?)

As above, sweep plus query is slightly slower than just running
the query. Both are slow. The sweep plus query is slower because
a) it's two statements and b) the sweep reads the whole database,
not just the table will all the rolled back records.

If your application regularly inserts hundreds of thousands of
records in a single transaction that rolls back, you might consider
using temporary tables.

Good luck,

Ann