Subject Re: [firebird-support] Problem with deleting many records in a big database
Author Helen Borrie
At 11:07 AM 9/04/2005 +0200, you wrote:


>Hi,
>we have one special problem: We have to insert a huge amount of data
>(500.000 records a day) into one table of a firebird 1.03 database. The
>whole database has about 60 other tables with significant less data.
>The data of this special table is needed only 30 days, so we try to
>delete older data periodically, otherwise the database is getting to
>large.
>The system is kind of a 24/7 system.
>
>Our problem is that the garbage collection (GC) deleting the records
>takes to long (several hours) and new users cann't connect to the system
>as long as the gc takes.

Instead of deleting huge batches every few days, experiment with a kind of
FIFO maintenance sequence daily, during a quiet time when most of the
inserts are finished for the day:

delete from ThisTable
where date_inserted between (cast ('today' as date) - 30) and 'yesterday' ;
commit;
select count(*) from ThisTable;
commit;


>We found out that the time GC needs increases superproportional with the
>size of the database (or the table?).

It will increase (roughly) in proportion to the number of records
deleted. Where a script like the one suggested could help is that it will
release those deleted stubs for background GC and spread the effect of a
huge, single-hit delete and GC operation when the number of deletions is
left to become very large.

Experiment with typical loads, of course.

And don't overlook the possible slowing effects of constant large
insertions and deletions on indexes. A regular offline task to drop and
recreate of the primary key constraint and set inactive/active on the other
indexes might also reduce the tendency for indexes to go out of balance as
a result of these frequent heavy batch inserts and deletes.

Actually, you missed two related threads on this very subject in recent
days, that would provide more insight. Pick up approximately the last 300
postings in your newsreader (egroups.ib-support news mirror at
news://news.atkin.com) and look for the second thread, starting at message
#52919.


>We have to use Firebird 1.03 because the system has to run on RedHat
>7.2.
>At the moment we use Firebird SuperServer because Classic didn't seem to
>improve the situation (with Classic the delete itself did take much
>longer, otherwise Classic did not block the login of new users).
>
>Questions:
>==========
>Has anybody any ideas to improve the situation.
>
>Is it "normal" that GC takes more then 10 hours after delete 400.000
>records (out from 10 Mio) on a P4 3GHz system with no other job to do?

It could be exprected to take that long (in intermittent chunks) with a lot
of garbage to collect, esp. if large amounts of uncollected garbage are
left from previous cycles because of overlapping operations.

>(it only takes 5 min. if 400.000 out from 900.000 records are deleted)

Your timings might not be telling you useful information for comparing one
bulk delete with another, if intensive update operations are also going on
concurrently. Updates leave old record versions for GC, too.

./hb