Subject Re: [firebird-support] Re: Firebird Unstable under load
Author Helen Borrie
At 11:55 AM 23/02/2006, you wrote:

>What you need to figure out is why the server is terminating
>abnormally. This can happen if you have a faulty UDF or due to
>hardware failure or software bug. Maybe someone else can comment on
>what may cause this error message.

Well, in these circumstances it means the server is repeatedly
crashing and restarting before enough time has elapsed for NOS to
detect the lost connections. At a guess, I'd say you loaded up the
superserver with more connections than you have the memory to handle
so the server is crashing at some point when memory is exhausted.

> > This happens periodically usually in batches (we have ~ 200 simultaneos
> > connections happening this way under load)

OK, let's look at the load. 200 simultaneous connections use ~ 2Mb
RAM each just to exist. There's 400 Mb RAM. Then, also in RAM,
you've got the page cache (possibly over-configured?) say (2048 * 4K,
if you're using the default, you figure it if it has been
reconfigured), the lock table (growing dynamically as each connection
does stuff to tables), the undo logs for inserts (which won't be
cleaned out until the transactions are ended by hard commits or hard
rollbacks...commit retaining won't do it...). You've got the
transaction log there in RAM, too, representing at least 200
transactions, which will keep growing as long as you're doing lots of
stuff with long-running transactions.

You're running mega-multiple instances of your application locally,
so there's a hungry monster for you, * 200, with (you hope) enough
sockets to support 200 open localhost ports...if you've got a
succession of crashed loopback connections hanging around waiting for
keep-alive timeouts (default 2 hours) that could probably stack up to
a lot of dead-but-busy sockets hanging about. Sockets consume RAM,
too. You're also running Windows and network services
and...whatever...wwell, you'd better have lots and lots of RAM.

Superserver can address up to 2 Gb of whatever RAM you have, if it's
available; after that, it's "Curtains!" (and "install Classic" is on
your To-Do list).

All this stuff about blown resources is conjecture, of
course. You've not said what the hardware is, what resources you
have, why you need to make 200 connections to localhost, or what is
happening inside these connections.

> >
> > These logs [showing server crashes and restarts at 2-7 minute
> intervals] seem to be unrelated to the PHP connection
> problems, they are not happening at exactly the same time.
> >

Um...crash-restart-crash-restart--- It's hard to see how they could
happen at exactly the same time.
But, basically, if the log is only showing messages from the Guardian
that were kicked off by a client request (which is what you showed
us) and you can rule out memory corruption from outside (bad UDF)
then you can conclude that the database server, before each crash,
has reached the state where it couldn't get enough resources even to
send a log message that might help to pin down the exact
crisis. Possibly a log entry preceding the first crash might lead somewhere.

./heLen