firebird-support - Bizarre Firebird Server "Lock-up"/Freeze

Subject	Bizarre Firebird Server "Lock-up"/Freeze
Author	david_parcelperfect
Post date	2008-02-27T12:27:23Z

Hello,

We are currently experiencing problems with a few firebird
installations where the firebird server appears to hang for no reason,
accepting no further connections. I would be very grateful if anybody
could provide some insight to this anomaly.

We are using Firebird 2.0.1 CS on a Linux operating system (Centos
4.3). The server accepts anything between 50 and 150 connections at
any given time. Xinetd has been configured to allow 600 instances, so
this cannot be the problem.

Connections may come from a variety of sources:
· Various win32 applications written in Delphi or C#
· Various PHP Scripts running on the Linux server

Most client applications make use of firebird events in one way or
another. At times, there may be a large number of clients listening
for events. These events occur on specific updates of specific tables,
and therefore may occur very often.

Mostly, the server runs without problems, however, please see some
analysis after one of these "lock-ups" below:

GSTAT Header:
=============
Database header page information:
Flags 0
Checksum 12345
Generation 2300266
Page size 4096
ODS version 11.0
Oldest transaction 2135554
Oldest active 2238537
Oldest snapshot 2135707
Next transaction 2300247
Bumped transaction 1
Sequence number 0
Next attachment ID 0
Implementation ID 19
Shadow count 0
Page buffers 0
Next header page 0
Database dialect 3
Creation date Dec 20, 2007 22:10:59
Attributes force write

Variable header data:
Sweep interval: 20000
*END*

Selected FB_Lock_Print Data:
============================
*The lock header block below does not appear to have any serious
issues that I can identify:

LOCK_HEADER BLOCK
Version: 16, Active owner: 0, Length: 851968, Used: 840364
Lock manager pid: 3888
Semmask: 0x2CE8, Flags: 0x0001
Enqs: 2194694426, Converts: 318743, Rejects: 187646, Blocks:
364828
Deadlock scans: 0, Deadlocks: 0, Scan interval: 10
Acquires: 2203107123, Acquire blocks: 41672020, Spin count: 0
Mutex wait: 1.9%
Hash slots: 101, Hash lengths (min/avg/max): 9/ 17/ 31
Remove node: 0, Insert queue: 0, Insert prior: 0
Owners (55): forward: 11932, backward: 808568
Free owners (6): forward: 684840, backward: 497312
Free locks (867): forward: 148736, backward: 606144
Free requests (1219): forward: 822976, backward: 145860
Lock Ordering: Enabled

*This is the owner block for the database itself. Of particular
concern is the "hung" next to flags.

OWNER BLOCK 11932
Owner id: 3888, type: 1, flags: 0x04, pending: 0,
semid: 1
Process id: 3888, UID: 0x0 Alive
Flags: 0x44 hung
Requests: *empty*
Blocks: *empty*

*This is a sample process owner block. There are several processes
that have "hung" next to flags. (I have noticed the presence of "hung"
for processes during problem free times as well, and therefore imagine
that this is not a major issue?)

OWNER BLOCK 173884
Owner id: 4131, type: 3, flags: 0x20, pending: 0,
semid: 11 (available)
Process id: 4131, UID: 0x1F5 Alive
Flags: 0x60 hung wake
Requests (248): forward: 164132, backward: 761428
Blocks: *empty*

My suspicion is that this could be related to CORE-1410 "Deadlock in
classic server on Linux"
(http://tracker.firebirdsql.org/browse/CORE-1410), however, the load
on the server is not excessively high as can be seen from the lock
header block above.

All firebird configuration options for the above database were left as
default. The database is only 1.5gb and the server has 160gb of
available drive space. There is 3gb of memory available.

Rebooting the Linux server resolves the problem, as does killing all
Firebird processes (fb_inet_server) though the latter is definitely
not optimal (especially since I believe this could cause DB corruption
- see CORE-1439).

I look forward to any comments or suggestions!

Thanks,
David