firebird-support - Re: [firebird-support] Re: Firebird 100% CPU Usage After Backup

Subject	Re: [firebird-support] Re: Firebird 100% CPU Usage After Backup
Author	Olivier Mascia
Post date	2004-11-06T08:56:21Z

Helen,

Helen Borrie wrote:

> Graham,
>
> Are you certain that all the backups are finished when you observe this
> behaviour?
>
> It might be one of your databases undergoing a large amount of garbage
> collection during the course of extracting the data into the portable data
> format (gbak does this...)
>
> ./heLen

I have seen that behaviour too.
I have not been able to report it meaningfully because I could not
clearly identify the conditions when this was happening. I first thought
it might be related to the events & lost connections issue discuss on
developpers lists and recently fixed for 1.5.2. Though I have
experienced it on a server with a single DB and a single client app
using no events. On this configuration, an application component runs as
a service on the server and runs a backup (through the service API)
unattended, every 6 hours of run time.

Out of 4 occurences clearly identified of this problem, we found 3 times
fbserver.exe (1.5.1) eating CPU (the server is a dual processor - no
hyper-threading). The fourth occurence was a bit more special. It was
discovered also when users said "oh the server seems so slow for the
last 30 minutes". Upon checking it, there was no mad cpu useage, but the
OS page file was nearly overfilled. Checking further fbserver.exe was
responsible for a bit more than 1.4 GB (yes giga) of memory, explaining
the disk trashing that was ongoing.

This was on a Windows 2000 Server SP4 with absolutely all Microsoft
fixes and updates.

Though the backups are run every 6 hours, the problem happens quite
rarely, certainly not twice a day, but more like once every other week
or so.

For the last week, we have replaced fbserver 1.5.1 with 1.5.2 RC1 and
are monitoring the server for this problem to happen again. It has not
reproduced yet, but alone this does not yet prove the problem to have
been fixed. If only I could find a way to reproduce it systematically...

Oh yes, the backups always finish in a finite small amount of time. This
particular DB is only about 400 MB for now consisting mainly of small
blobs (from 10 to 100 KB - fax tiff files). Once a record has been
inserted, it is never deleted except in rare cases. An inserted record
(fax scheduled) receive one or two updates within the minutes after its
insertion (fax in progress, fax sent/failed), then never again (the fax
is considered archived). There are very few opportunities for
concurrency over the same rows, transactions are held started for very
short instants, so I expect back versions should be minimum in this
scenario and the garbage collection should be fairly straightforward and
minimal. Running a sweep against the DB is nearly instant, at any time I
can run it. So I really don't think the issue could be backup-time
garbage collection related.

I would have more easily thought of a problem with the Service API.
Though Graham reports he is using gbak through a batch. So I'm a bit out
of ideas for now. Just waiting for 1.5.2 RC1 to exhibit the problem
again... or not.

I feel really sad not being able to produce a reproduceable test case
other than one that happen once every week (sometimes sooner, sometimes
later).

--
Olivier Mascia