Subject | Periodic database slowdown - troubleshooting steps? |
---|---|
Author | Bob Murdoch |
Post date | 2012-09-17T14:27:07Z |
I have an FB 2.1.5 Classic server running on a Windows 2003 server,
with a single hard drive for the operating system, and a 3 disk raid 5
array for the database. We have one database on this machine, which
is a dialect 1 database that was started on IB6.0 many years ago,
currently at 90GB. We have sweep disabled, and each night run gbak,
gfix sweep, as well as reindex all tables via a script.
The database has very little OLTP, and is mostly used for reporting
and serving web pages to internal business users. We do alot of ETL
starting very early each morning, and create a mix of scheduled
reports as well as allowing users to specify parameters to create
pre-designed reports in an ad-hoc manner.
Once or twice per month, the system slows down tremendously. One ETL
process typically runs at a pace of about 1000 records per 10 seconds.
During these slow periods, the same ETL will run 1000 transactions per
60-80 seconds. When processing a file with 1mil+ records, this slow
down costs us hours.
I have not been able to determine the reason for these slow periods.
They do not coincide with higher cpu or disk usage most of the time
I'm seeing very little usage of anything disk/cpu/network/memory. I
do see more connections to the database during these periods
typically we have 10 to 15 connections, and the number may double
during the problem times. This is due to the fact that the reports
that users are requesting are taking longer to run, and our
pooled-connection application server or web server creates more
connections to satisfy new user requests.
I do see more queries running by the time I am notified of the
problem. Again though, this is a coincidence of the slowdown and not
the cause queries are taking longer to run, and therefore we have
more chance of overlapping user requests than usual.
Most of the time we need to reboot once or even twice to fix the
problem. That is not a viable long-term solution though, and I'm
looking for more ideas to determine what may be happening. Any ideas
would be most helpful.
I have included gstat h output of the database as it is suffering
from the issue below:
Database header page information:
Flags 0
Checksum 12345
Generation 43136192
Page size 8192
ODS version 11.1
Oldest transaction 40789582
Oldest active 41467442
Oldest snapshot 41467442
Next transaction 42431040
Bumped transaction 1
Sequence number 0
Next attachment ID 705070
Implementation ID 16
Shadow count 0
Page buffers 2048
Next header page 0
Database dialect 1
Creation date May 2, 2009 22:22:39
Attributes force write, no reserve
Variable header data:
Sweep interval: 0
*END*
Thank you for your assistance,
Bob M..
[Non-text portions of this message have been removed]
with a single hard drive for the operating system, and a 3 disk raid 5
array for the database. We have one database on this machine, which
is a dialect 1 database that was started on IB6.0 many years ago,
currently at 90GB. We have sweep disabled, and each night run gbak,
gfix sweep, as well as reindex all tables via a script.
The database has very little OLTP, and is mostly used for reporting
and serving web pages to internal business users. We do alot of ETL
starting very early each morning, and create a mix of scheduled
reports as well as allowing users to specify parameters to create
pre-designed reports in an ad-hoc manner.
Once or twice per month, the system slows down tremendously. One ETL
process typically runs at a pace of about 1000 records per 10 seconds.
During these slow periods, the same ETL will run 1000 transactions per
60-80 seconds. When processing a file with 1mil+ records, this slow
down costs us hours.
I have not been able to determine the reason for these slow periods.
They do not coincide with higher cpu or disk usage most of the time
I'm seeing very little usage of anything disk/cpu/network/memory. I
do see more connections to the database during these periods
typically we have 10 to 15 connections, and the number may double
during the problem times. This is due to the fact that the reports
that users are requesting are taking longer to run, and our
pooled-connection application server or web server creates more
connections to satisfy new user requests.
I do see more queries running by the time I am notified of the
problem. Again though, this is a coincidence of the slowdown and not
the cause queries are taking longer to run, and therefore we have
more chance of overlapping user requests than usual.
Most of the time we need to reboot once or even twice to fix the
problem. That is not a viable long-term solution though, and I'm
looking for more ideas to determine what may be happening. Any ideas
would be most helpful.
I have included gstat h output of the database as it is suffering
from the issue below:
Database header page information:
Flags 0
Checksum 12345
Generation 43136192
Page size 8192
ODS version 11.1
Oldest transaction 40789582
Oldest active 41467442
Oldest snapshot 41467442
Next transaction 42431040
Bumped transaction 1
Sequence number 0
Next attachment ID 705070
Implementation ID 16
Shadow count 0
Page buffers 2048
Next header page 0
Database dialect 1
Creation date May 2, 2009 22:22:39
Attributes force write, no reserve
Variable header data:
Sweep interval: 0
*END*
Thank you for your assistance,
Bob M..
[Non-text portions of this message have been removed]