Subject Re: Oldest Snapshot slowing down DB
Author imoffskating
Thanks again for your kind and detailed reply, it has defintely shed
some light on the issue.

i've added in some more comments inline below...



--- In firebird-support@yahoogroups.com, Helen Borrie <helebor@...> wrote:
>
> At 03:26 AM 8/02/2006, you wrote:
> >Hello,
> >
> >we are using firebird 1.5 super server on ubuntu linux and are
> >experiencing an odd slow down after about 15 hours intensive use.
> >The DB server is servicing many short-ish transactions from a pool of
> >about 150 connections and the db size on disk is 1gig after a day.
>
> Can you expand on that a bit? Is the db growing on a daily basis
> more than you think it should?
>

the db is growing as we expect with about 500+MB a day of data added


>
> >All seems nice and fast for about 12 hours then gstat reveals the
> >following oddity, an unusually old Oldest snapshot when oldest
> >transaction seems to be normal.
> >
> >------------------
> >Database header page information:
> > Flags 0
> > Checksum 12345
> > Generation 9188550
> > Page size 4096
> > ODS version 10.1
> > Oldest transaction 9181346
> > Oldest active 9181347
> > Oldest snapshot 9178294
> > Next transaction 9188549
> > Bumped transaction 1
> > Sequence number 0
> > Next attachment ID 0
> > Implementation ID 19
> > Shadow count 0
> > Page buffers 0
> > Next header page 0
> > Database dialect 3
> > Creation date Feb 6, 2006 12:40:09
> > Attributes
> >
> > Variable header data:
> > *END*
> >-----------------------
>
> I don't know what you mean by "when oldest transaction seems to be
> normal". The gap betwee the oldest [interesting] transaction (OIT)
> and the oldest active (OAT) are OK for now, since they are virtually
> the same. But 7,200-odd transactions have started since that oldest
> active transaction started. That means you've got a garbage buildup
in there.
>
> >I cannot find much information on what Oldest Snapshot actually means,
>
> Oldest Snapshot is a snapshot of the OAT last time garbage collection
> was done. So in your database at the time this gstat was done,
> you've got uncollected record versions around the place from > 10,000
> transactions.
>
> >and we are pretty sure its not the program leaving open transactions
> >accidently as Oldest transaction is fine
>
> ?


garbage collection build up seems to be whats happening. Thanks for
pointing that out.

What i meant by " we are pretty sure its not the program leaving open
transactions accidently as Oldest transaction is fine" is that we are
pretty certain we are committing and closing all our connections so
there are no connections left open to the db accidentty.


>
> >and we have timers in the code to check for possible leaked
transactions.
>
> What's a "leaked transaction"?
>

Sorry my bad explanation. i mean we have a timer in our java code
which checks to see if we have accidently not closed a connection.


>
> >Restarting the firebird process using the init.d script sorts it out
> >and the db speeds up again to normal, but this isnt really a desirable
> >solution.
> >
> >Any light on the situation would be greatly appreciated,
>
> I'd want to be looking at this garbage buildup. Basically, 10,000
> transactions hanging around, in concert with your observation of
> slowdown, suggests that GC isn't keeping up with your throughput a
> lot of the time. Quite a lot of those 10,000 transactions probably
> have garbage but the GC thread isn't able to reach a lot of it.
>
> The actual number of transactions per se isn't the issue. You want
> to use a cron job to capture gstat -h, along with a timestamp, into a
> text file at regular intervals through busy periods over 2-3
> days. By following the oldest active and the oldest snapshot, you'll
> get an idea of what you "clearance rate" is.
>
> Unfortunately, a common cause of the phenomenon you seem to be
> experiencing is a SELECT statement in a read-write txn that just
> stays open forever. If your front-end is Delphi or Builder (which
> you don't say) it's extremely easy to fall into that trap.
>
> In another post you asked about setting a sweep interval. With the
> current numbers it doesn't look like a solution (but plotting your
> stats might reveal something differently). Even then, keeping the SI
> at zero and scheduling a sweep at lunchtime (or some other slow time)
> might help to reduce what the GC thread has to tackle each time it
> gets a shout from the engine.
>
> There's also the possibility that it's the GC thread itself that's
> slowing things down; so stopping the database is only going to make
> matters worse, by continually preventing it from finishing its
> job! Each time you restart, the garbage heap starts that much higher
> and GC just never sees the light at the end of the tunnel.
>
> Instead of taking the database down and possibly knee-capping GC,
> give someone a little utility to run a sweep on the server when
> things get unbearably slow. You don't have to shutdown the db to do
> a manual sweep; but, because sweep isn't a background process,
> you'll potentially shorten the *length* of time it takes to de-garb
> and actually *complete* it once in a while.
>
> ./heLen
>

We are running java 1.5 with Jaybird JDBC byu the way.

Thanks for the suggestion, i have done this gstat -h on a cron job and
it seems oldest snapshot is not been increased, ie garbage collection
has not run at all :( .

here are two runnings of gstat -h. One at 6pm, the other 15 hours
later at 9am. Oldest snapshot is the same on both.


6pm:
----------------
Database header page information:
Flags 0
Checksum 12345
Generation 10076932
Page size 4096
ODS version 10.1
Oldest transaction 10035365
Oldest active 10035366
Oldest snapshot 10035366
Next transaction 10076930
Bumped transaction 1
Sequence number 0
Next attachment ID 0
Implementation ID 19
Shadow count 0
Page buffers 0
Next header page 0
Database dialect 3
Creation date Feb 6, 2006 12:40:09
Attributes

Variable header data:
*END*
-----------------

9am:
----------------
Database header page information:
Flags 0
Checksum 12345
Generation 15542154
Page size 4096
ODS version 10.1
Oldest transaction 10035365
Oldest active 10035366
Oldest snapshot 10035366
Next transaction 15542152
Bumped transaction 1
Sequence number 0
Next attachment ID 0
Implementation ID 19
Shadow count 0
Page buffers 0
Next header page 0
Database dialect 3
Creation date Feb 6, 2006 12:40:09
Attributes

Variable header data:
*END*

-------------------------


should i have a cron job to force sweep of the database perhaps? or is
there a setting in firebird.conf?
is the best way to force a sweep gstat -sweep?

Any other suggestions would be much appreciated,

Thanks

Alex