Subject Re: [firebird-support] Oldest Snapshot slowing down DB
Author Helen Borrie
At 03:26 AM 8/02/2006, you wrote:
>Hello,
>
>we are using firebird 1.5 super server on ubuntu linux and are
>experiencing an odd slow down after about 15 hours intensive use.
>The DB server is servicing many short-ish transactions from a pool of
>about 150 connections and the db size on disk is 1gig after a day.

Can you expand on that a bit? Is the db growing on a daily basis
more than you think it should?


>All seems nice and fast for about 12 hours then gstat reveals the
>following oddity, an unusually old Oldest snapshot when oldest
>transaction seems to be normal.
>
>------------------
>Database header page information:
> Flags 0
> Checksum 12345
> Generation 9188550
> Page size 4096
> ODS version 10.1
> Oldest transaction 9181346
> Oldest active 9181347
> Oldest snapshot 9178294
> Next transaction 9188549
> Bumped transaction 1
> Sequence number 0
> Next attachment ID 0
> Implementation ID 19
> Shadow count 0
> Page buffers 0
> Next header page 0
> Database dialect 3
> Creation date Feb 6, 2006 12:40:09
> Attributes
>
> Variable header data:
> *END*
>-----------------------

I don't know what you mean by "when oldest transaction seems to be
normal". The gap betwee the oldest [interesting] transaction (OIT)
and the oldest active (OAT) are OK for now, since they are virtually
the same. But 7,200-odd transactions have started since that oldest
active transaction started. That means you've got a garbage buildup in there.

>I cannot find much information on what Oldest Snapshot actually means,

Oldest Snapshot is a snapshot of the OAT last time garbage collection
was done. So in your database at the time this gstat was done,
you've got uncollected record versions around the place from > 10,000
transactions.

>and we are pretty sure its not the program leaving open transactions
>accidently as Oldest transaction is fine

?

>and we have timers in the code to check for possible leaked transactions.

What's a "leaked transaction"?


>Restarting the firebird process using the init.d script sorts it out
>and the db speeds up again to normal, but this isnt really a desirable
>solution.
>
>Any light on the situation would be greatly appreciated,

I'd want to be looking at this garbage buildup. Basically, 10,000
transactions hanging around, in concert with your observation of
slowdown, suggests that GC isn't keeping up with your throughput a
lot of the time. Quite a lot of those 10,000 transactions probably
have garbage but the GC thread isn't able to reach a lot of it.

The actual number of transactions per se isn't the issue. You want
to use a cron job to capture gstat -h, along with a timestamp, into a
text file at regular intervals through busy periods over 2-3
days. By following the oldest active and the oldest snapshot, you'll
get an idea of what you "clearance rate" is.

Unfortunately, a common cause of the phenomenon you seem to be
experiencing is a SELECT statement in a read-write txn that just
stays open forever. If your front-end is Delphi or Builder (which
you don't say) it's extremely easy to fall into that trap.

In another post you asked about setting a sweep interval. With the
current numbers it doesn't look like a solution (but plotting your
stats might reveal something differently). Even then, keeping the SI
at zero and scheduling a sweep at lunchtime (or some other slow time)
might help to reduce what the GC thread has to tackle each time it
gets a shout from the engine.

There's also the possibility that it's the GC thread itself that's
slowing things down; so stopping the database is only going to make
matters worse, by continually preventing it from finishing its
job! Each time you restart, the garbage heap starts that much higher
and GC just never sees the light at the end of the tunnel.

Instead of taking the database down and possibly knee-capping GC,
give someone a little utility to run a sweep on the server when
things get unbearably slow. You don't have to shutdown the db to do
a manual sweep; but, because sweep isn't a background process,
you'll potentially shorten the *length* of time it takes to de-garb
and actually *complete* it once in a while.

./heLen