Subject Re: [firebird-support] Getting garbage info from the database
Author Helen Borrie
At 03:24 PM 2/12/2003 +0200, you wrote:
>Hi There,
>
>We have a client server application that connects to the database and
>allows the user to view and change some data. At night we have programs
>that run and import new data (from host systems like SAP etc) and does
>many, many calculations and manipulations of the data so that when the
>user connects the next day he could see the results of last nights
>import and data calculations. Our application is mission critical to the
>client, the next day he can not do his work until our overnight
>processes are done.
>
>One of the things we have that is killing us at the moment (we suspect)
>is garbage. As these processes that run over night (and creates garbage)

Inserts don't create garbage. But big inserts can mess up indexes. So if
you are inserting and then following that up with a lot of updates and
deletes, you'll get garbage.

>has a limited time window in which to finish. If an overnight batch
>blows out by 6 hours the client is stuffed and furious. This makes our
>lives very difficult as we are forever trying to fight fires.
>
>We need help so here are a couple of questions:
>
>1.) Can some one point me to a in depth tutorial about garbage in the
>database?

In my forthcoming book. :-)

>
>2.) How do I find out at any stage what is the state of the amount of
>garbage in the database?

You can't, really, except by shutting the database down and looking at the
growth of the file(s).

>
>3.) How do I find out where (what tables) the garbage is affecting the
>database?

If your big batch operation is causing this, then it will be those
tables. Garbage is old record versions and index stumps. The garbage sits
in pages where your active data are stored. If you are using one huge
transaction to do everything, you'll slow things down horribly, because
record version will pile upon record version for all the rows the
transaction works on. "Double-dipping" the same record multiple times in a
single transaction causes this.
-- break up the inserts into chunks of 7000 - 10,000 and make sure you use
a hard commit.
-- commit inserts before starting to do DML on the new rows.

Huge deletes and inserts mess up indexes. You could experiment with
setting indexes inactive for the inserts and then active again for the
DML. A big part of your slowdown is probably from unbalanced indexes.

>
>4.) What is the best way to get rid on the garbage when there is
>garbage? We currently fire <select count(*) from TableName> to the
>database to get rid of garbage on a specific table.

That will do it.

>
>5.) Why does garbage sometimes make the database grind to a halt?

Not hard-committing update and delete work often enough. But - it might
not be all garbage that is slowing things down. Take care of the indexes.

>
>6.) Sometimes garbage collection is quick (10 min) and sometimes it
>takes long (12 hours), why?

Garbage collection goes on all the time in background, if it can. Without
good transaction management, you inhibit that background GC and so the
first transaction after completion of your job will get hit with a load.

/heLen