Subject Re: [firebird-support] Re: Firebird 1.5 database size increasing
Author Ann W. Harrison
Hi Bill,

[gstat results moved to the end]

OK. When the table is big, it seems to have about 325,000
records, most of which have one back version, altogether
using 4,891 pages. OK, that works out to about 61 bytes
per record, assuming a 4K page size.

It is odd that the record length is reported as 0.03 and
the version length as 21.45. In theory, the version is
never longer than the record. Hmmm.... OK, I've got it.
The primary record version is just a deleted stub. Right.
Store 325,287 records then delete 324,898 of them.

That works pretty well with the estimate of 61 bytes -
22 bytes of data, 17 bytes of record header plus page
index, and another 17 bytes for the deleted stub is 56
and there's probably 3 bytes per record of page header
and unusable bytes of space.

When the table has "settled", it has 481 records and uses
21 pages. That's seems pretty good.

> Of course, since the data pages are never released, the database file
> size grows.

Right, but if the same sort of operation occurs again, the space
that was released will be reused for the next 300,000 temporary
records. So there's a lot of initial growth as the database
stretches to fit the largest amount of data that will be stored,
then growth should be slow as space is freed and reused rather
than allocated new.
> My user says:
> All transactions that are actually changing data, do a commit and
> succeed. All hibernate operations are taking place within a database
> transaction. This transaction is always rolled back. These
> transactions will be across many tables, not just the 2 we are
> focusing on.

What's a hibernate operation? Maybe I'm having a bad day, but
that description doesn't make any sense to me at all. Leaving
aside the hibernation issue, it seems to say that all transactions
that change the database commit. From the stats, it appears that
one group of transactions store a lot of records. The database must
be large enough to hold all those records. Later another group of
transactions deletes all those records. That's OK. The database
has to grow by about 16 bytes per record to hold the deleted stubs.

If transactions were rolling back rather than deleting records -
so the inserts were never committed, you wouldn't see 325,000
records. And if the records were committed, they can't be rolled
back. My guess is that the rollback refers to a read-only
transaction and is not relevant here.

> So, now I am looking at the test case, to see if the rollback could be
> replaced with a delete operation instead. I think this would reduce
> the number of back versions created in this table.

No. If records actually were being inserted and rolled back -
which some may be - they would be cleaned up automatically
during the rollback. Gstat might report their existence if
it ran after some inserts and before the rollback completed,
but it wouldn't report back versions at all, since back versions
are created only when a stored record is updated or deleted. An
insert and rollback won't create back versions.

So changing a rollback to an insert/delete actually increases the
number of back versions, even if it's all done in one transaction.
When a transaction inserts and deletes a record, the result is a
primary records that's a delete stub and a back version with the
original data. That may not be obvious, but verb rollback and
savepoints require that we keep a history of the changes made by
each transaction - not just the final result. In fact, in some
cases, subsequent updates do overwrite earlier updates, but the
case here is not one of them.

> Also, I'm not really clear as to how/when the back versions get
> purged, so I will do some more analysis there.

The answer to that is "it depends". In Vulcan and in Classic,
the garbage collection is done when another transaction attempts
to read a record that has unnecessary versions. In FB 1.x,
when a transaction notes the presence of unnecessary versions
it puts them on a list for the garbage collect thread to remove
at its leisure. In Firebird 2, if a transaction notices an
unnecessary record version sometimes it does the garbage collection
itself, and sometimes puts it on the garbage collect thread's
list - depending on the cost of doing the operation immediately.

My guess is that you should just tell your customer that the
database will grow quickly to hold the largest number of records
stored, then stop growing or grow very slowly.