firebird-support - RE: [firebird-support] Is Firebird Rock Solid?

Subject	RE: [firebird-support] Is Firebird Rock Solid?
Author	Helen Borrie
Post date	2005-10-01T02:37:07Z

At 11:33 PM 30/09/2005 +0100, you wrote:

>Are there any plans for a feature in FB 2.0 to allow GC to be deferred,
>for example until the middle of the night?

There are some changes in Fb 2 to allow more flexibility in how and when
Superserver does garbage collection. SS threads all client processes off a
single server process. Under SS, GC is a client process (albeit an
internally-controlled one).

The Classic architecture causes a single process to be started for each
client connection. Each client process cleans up any garbage hanging
around for tables it selects from. This might be garbage that it created,
or garbage created by someone else. It's the first transaction to select
from a table after a hard commit of updates or deletes from that table that
gets the "hit". This is referred to as "cooperative garbage collection".

>We run FB 1.5.2 Classic on a multi-CPU Windows 2003 machine. We can run
>large queries and procedures during the day without appearing to affect
>the performance of the database for other users, as we'd expect of
>Classic on a multi-CPU machine :-) But the period of GC *after* these
>queries/procedures doesn't seem to behave as nicely, and we think it
>affects database performance for all users. So effectively we cannot run
>anything large during the day after all :-(

If cooperative GC is causing a serious hit for everyone, then it's highly
likely that misbehaving applications, or ill-advised indexes, or RAM
starvation (one or any of these) could be slowing things down. Control of
transactions is entirely up to the client application and long-running
transactions are by far the most frequent cause of large buildups of
garbage. As long as a client application requires a record to be
"interesting", that record's obsolete pieces can't be garbage
collected. So, if you have code that keeps records in frequently-accessed
tables interesting for hours, then someone is going to get hit with GC once
the users log out and go to lunch!

The culprits that cause these log-jams are 1) interfaces that call Commit
Retaining instead of Commit (since work committed with CR always remains
interesting); 2) bulk deletes capriciously executed (these need to be
completed quickly with a hard commit and followed immediately by a select
*); automatic sweeping disabled without any attention being given to
manual sweeping in lieu.

Delphi and C++ Builder applications are an exceedingly common source of
problem 1). So common, in fact, that it's virtually certain to be the
first trap that people fall into. Commit Retaining was created for Delphi,
along with a client-side hack called AutoCommit. It's not an entirely bad
thing, per se, if used with full understanding of its effects and with full
fallback against these effects coded in by the developer. However, since it
was actually designed to shield Paradox developers from the "problem" of
having to understand transactions (or even from learning SQL!) when they
moved on to InterBase, it has become one of most misused features in
Delphi/Firebird/InterBase development.

On the subject of resource usage, Classic has pluses if you want to utilise
your SMP machines. On the downside, each connection is allocated static DB
cache. The default for Classic is 75 pages, and this cache is allocated
statically from RAM (75 * page_size).

Now, if you are running the Guardian with Classic, it is possible that
connections are never going away (which they are meant to). Your
progressive loss of performance could be wholly or partly due to Guardian
restarting each client process whenever the user logs out, and holding
unused caches and other resources open for these "phantom" connections.

Check whether your server is starting Guardian and hence Guardian is
controlling the fb_inet_server parent process. If so, use instsvc.exe from
the command line to fix it. You'll need to get all clients off, shut down
both services, remove them and then reinstall the fb_inet_server service
without Guardian. Read the docs for this program (in the \doc directory)
before you do this.

Just don't believe anyone, no matter how erudite, who says "this can't
happen". It can. IMNSHO, Guardian and Classic simply do not belong
together. It is my strongly held view that the installation of Guardian as
a service for a Classic installation should be blocked by the MIS program
(at least!); or, better, the option for instsvc.exe to install Guardian
for Classic should be made no-op.

Also get out the abacus and work out how much static RAM each of your
connections is consuming. Accreted DB cache will give a rough (under)
estimate, since each connection has other (small) static memory structures
created for it.

Remember, too, that queries that do sorts (ORDER BY, GROUP BY) will use RAM
if is available -- greedily and indiscriminately. A few processes each
consuming RAM for sorted output of a huge set in a long-running transaction
will soon deprive the rest from being able to use RAM for sorts and then
the engine will resort to using temporary files on disk (and NOT RAM
cache). Depending on the configuration of your sort space in the
filesystem, this could be an i/o bottleneck.

You can lighten the competition for I/O by configuring sort space on a
"quiet" disk, away from the disks where databases and RAM cache are being
operated on. If you don't configure sort space at all, the Windows \tmp
directory (usually on the C drive, if it hasn't been explicitly configured
to be somewhere more sensible) is used. \tmp is typically clogged up with
all sorts of neolithic cruft from apps that don't clean up, so it's a
likely bottleneck on a system where Classic is already straining
resources; and it's likely to be a source of fragmentation on the C drive
in any event. Add that to the "typical" Windows configuration, where
pagefile.sys is sitting, neglected and overlooked, on the C drive, and
you've got more than a single bottleneck to address...

./hb