Subject Periodic database slowdown - troubleshooting steps?
Author Bob Murdoch
I have an FB 2.1.5 Classic server running on a Windows 2003 server,
with a single hard drive for the operating system, and a 3 disk raid 5
array for the database. We have one database on this machine, which
is a dialect 1 database that was started on IB6.0 many years ago,
currently at 90GB. We have sweep disabled, and each night run gbak,
gfix –sweep, as well as reindex all tables via a script.



The database has very little OLTP, and is mostly used for reporting
and serving web pages to internal business users. We do alot of ETL
starting very early each morning, and create a mix of scheduled
reports as well as allowing users to specify parameters to create
pre-designed reports in an ad-hoc manner.



Once or twice per month, the system slows down tremendously. One ETL
process typically runs at a pace of about 1000 records per 10 seconds.
During these slow periods, the same ETL will run 1000 transactions per
60-80 seconds. When processing a file with 1mil+ records, this slow
down costs us hours.



I have not been able to determine the reason for these slow periods.
They do not coincide with higher cpu or disk usage – most of the time
I'm seeing very little usage of anything – disk/cpu/network/memory. I
do see more connections to the database during these periods –
typically we have 10 to 15 connections, and the number may double
during the problem times. This is due to the fact that the reports
that users are requesting are taking longer to run, and our
pooled-connection application server or web server creates more
connections to satisfy new user requests.



I do see more queries running by the time I am notified of the
problem. Again though, this is a coincidence of the slowdown and not
the cause – queries are taking longer to run, and therefore we have
more chance of overlapping user requests than usual.



Most of the time we need to reboot once or even twice to fix the
problem. That is not a viable long-term solution though, and I'm
looking for more ideas to determine what may be happening. Any ideas
would be most helpful.



I have included gstat –h output of the database as it is suffering
from the issue below:



Database header page information:

Flags 0

Checksum 12345

Generation 43136192

Page size 8192

ODS version 11.1

Oldest transaction 40789582

Oldest active 41467442

Oldest snapshot 41467442

Next transaction 42431040

Bumped transaction 1

Sequence number 0

Next attachment ID 705070

Implementation ID 16

Shadow count 0

Page buffers 2048

Next header page 0

Database dialect 1

Creation date May 2, 2009 22:22:39

Attributes force write, no reserve



Variable header data:

Sweep interval: 0

*END*



Thank you for your assistance,



Bob M..



[Non-text portions of this message have been removed]