Subject Re: [firebird-support] firebird database table using too many data pages
Author Helen Borrie
At 08:07 AM 11/07/2007, you wrote:
>Hello over there,
>
>I hope someone could help me in with the following problem:
>
>I have a firebird 1.5 database which has a table named FaxMessage
>having 11397 records, each records uses about 200 bytes and
>according to database statistics this table takes 39801 data pages and
>257577 data page slots ( page size is 4096 bytes ). In the same
>database there is another table named SmtpMessage which has 98527
>records each taking about 200 bytes, but this table uses only 4466
>data pages and the same ammount of data page slot. Records are never
>deleted from these tables and access pattern to both tables is alike.
>
>How is it possible that FaxMessage table is wasting that lot of space ?
>
>Could it be because of a misbehaving application ?

Firebird requests space from the operating system in chunks of
<page_size> -- 4 Kb in this case -- whenever it needs to write
something and it cannot find enough existing space in which to write
it. Any space it acquires remains part of the database permanently
and is re-used when it becomes available after database housecleaning.

Firebird has "multi-version architecture" (MVA), which means that
each transaction that succeeds in posting an update or delete to the
database causes the old version of the record to be retained in the
database as long as its data remains "interesting". It will be
"interesting" until all transactions that have accessed that record
have completed (have committed or rolled back).

As these old record versions become not "interesting", they are
flagged by the engine for garbage collection (GC). How GC works
depends on the model of Firebird: in Superserver, it is a background
process that jumps in when there is idle time and tries to clean up
any freed old record versions. In Classic, the garbage collection
happens whenever a user accesses a table. From Fb 2.0 onward, GC is
a mixture of both styles, occurring opportunistically.

A misbehaving application could have the effect of tying up a lot of
space with garbage that can't be collected. Applications are
responsible for transactions. If the application doesn't take good
care to "hard commit" its read/write transactions regularly, then old
record versions cannot escape from their "interesting" status.

Long-running read/write transactions are the bane of an MVA
database. Database file size inflation is not the only effect, or
even the worst effect. Performance can deteriorate noticeably over
the course of time, notably in applications written with transactions
that use a Commit Retaining feature, a.k.a "soft commit". With this
setting, the transaction's work *does* get committed, but the
transaction is then effectively "restarted". Those "soft-committed"
records don't get cleared for GC, the engine has no choice but to
keep requesting more pages for writing new data and, indeed, you
could say that "space gets wasted".

There are a lot of applications around that use(d) the Borland Delphi
VCL data components. Borland made soft commits a "feature" of its
data access architecture as part of an overall approach to "dumb
down" database programming. Its Borland Database Engine (BDE) makes
it possible for programmers to write database application code
without knowing much about how the underlying database engine
works. In particular, it makes it possible for a programmer who
doesn't understand transactions management and multi-user concurrency
to ignore transactions altogether. The BDE drivers treated
client/server databases such as Firebird, InterBase, SQL Server,
Oracle, etc., as though they were a desktop database system,
specifically Paradox. (The BDE is the Paradox engine...)

The BDE is ancient history now, yet there are still developers out
there who try to use it. It is possible to configure the BDE to
behave more reasonably with InterBase and Firebird but the
all-too-common case has been for programmers to ignore such niceties
and go with the defaults.

The cost of such simplification for database backends that were
designed for multi-user use on networks, with transaction isolation
for achieving concurrency, is poorly-performing, bloated databases.

./heLen