Subject Re: [firebird-support] Re: Firebird DB Size
Author Helen Borrie
At 09:16 PM 25/07/2005 +0000, you wrote:
>--- In firebird-support@yahoogroups.com, Aage Johansen <aagjohan@o...>
>wrote:
> > Did you just load picture files into the blob fields, or did some your
> > program do some conversion (e.g. to bmp)? Did you anything else?
> > Do a backup+restore and see if database size changes.
>
>I did it, but it just changed less than 100 MB. DB size became 1.55 GB
>something. I am just loading jpg pictures to these two fields. There
>is no conversion.
>
> > I have a 500MB database containing more than 12.000 pictures (jpg).
> The
> > sizes varies between 1.5KB and 640KB. If I sum the size of all the
> > pictures I get about 400MB.
> > However, I usually store pictures outside of the database.
>
>In previous mail, I wrote that I dropped two BLOB fields which store
>images and DB became 1.72 MB only for 3000 records. So I guess, it
>best to store images outside of the database, however those images are
>not more than 100 KB and they are usually 40-50 KB.

Do you understand that Firebird does not store data in "data files", like
Paradox, Access, MySQL, etc. do? A Firebird database is one logical file
whose content is controlled by the database engine. As more data is added,
the engine asks the operating system for more disk space. It manages this
disk space as "pages". The operating system doesn't recognise "pages".

The size of a page is determined by the page size you specified when you
created the database. If you did not specify it, the default page size is
4 Kb. The engine asks for more pages as it needs them. It never gives
them back. Space left by deleted records and blobs will be released for
re-use once the engine has completed its garbage collection.

When the engine gets a new page, it is "just a page". Then, depending on
what the page is used for, it is assigned a page type - pointer page, data
page, index page, etc. The data belonging to a blob is stored on a "blob
page" - usually - although very small blobs may actually be stored on the
record's data page. Assume, though, that your 40~100 Kb picture blobs are
stored on blob pages.

A data page can (potentially) store multiple records. The data record
itself stores a Blob ID only, a pointer to the first page where the blob
data is stored.

Blob pages never store data for multiple blobs. So, taking a 73 Kb blob,
it will need 19 pages (76 Kb), since the first 72 Kb will need 18 pages and
the engine must store the last 1Kb on a new page. (In fact, these numbers
are not exactly like this, since each page uses some bytes for its header
data...)

Once a page has been assigned a page type, it remains as that type of page,
even after its data has been deleted. So, after garbage collection, an
empty blob page becomes available for use in storing another
blob. Likewise, an empty data page becomes available for storing another
record.

You can read all about the different page types here:

http://www.ibphoenix.com/main.nfs?a=ibphoenix&page=ibp_expert2

However, the crucial thing here is garbage collection, because of
Firebird's multi-generational architecture. When a client updates a
record, the engine stores both the old and the new version. Until the new
version of the record is committed, both versions are active. The
transaction that updated the record works with the new version, while all
other transactions continue to see the old one. After the client commits
the new version, a "delta copy" of the old record persists in the system
until the automatic garbage collection deletes it and releases the space.
The same preservation occurs when you delete a record and commit it.

Garbage collection cannot clean up and free these old versions if there is
any uncommitted transaction anywhere that is still interested in the old
version. There are some undesirable programming practices around that
cause old record versions to remain "interesting" unnecessarily for long
periods and to resist garbage collection, sometimes indefinitely. As well
as keeping that unused disk space unavailable for storing new data, it can
have some really bad effects on resource usage. These problems won't be
solved unless the programmer addresses them.

There is no "compact" operation that you can do on a Firebird database
because, unlike Access, etc., which are file-base data stores, there is no
correlation between a record and a physical file location that is known to
the operating system's file system. If you have a database file that grows
extraordinarily because ineffective transaction management by client
applications, or even because of users who end their database sessions by
hitting the "Off" switch, you will accumulate garbage that may never be
collected. To "compact" such a database requires a backup and restore.

I'll add some comments about storing blobs in databases in another posting.

./heLen