Subject Firebird 2.1.3 Classic on Win2009
Author s3057043
Hello Group,

I am looking into a particular server that has been performing slowly at certain times of the week. The server is running Firebird 2.1 CS on Win 2008 R2 and on any given day is serving between 80 and 250 concurrent connections.

I have been looking into the usual suspects of long running transactions but the pattern seems inconsistent with this. If the problem was to do with garbage collection, I would expect to see long queues for the hard disk and/or high CPU activity, but there is ample capacity on both at the time it is behaving slowly. There is ample free RAM and network capacity. I have therefore turned my attention to the lock manager to see if there is a bottleneck there.

In terms of the settings, it is a vanilla 2.1.3 Classic server installation. No `optimisations' have been made in Firebird.conf. I have (attempted to) read up on the Lock Manager in Helen's book and I have also read up on various posts in this list.

The output of fb_lock_print is too large to post, but the header is here.

Version: 16, Active owner: 0, Length: 11042816, Used: 11039960
Semmask: 0x0, Flags: 0x0001
Enqs: 1044810631, Converts: 872959, Rejects: 1002497, Blocks: 4519508
Deadlock scans: 29, Deadlocks: 0, Scan interval: 10
Acquires: 1114629128, Acquire blocks: 720514295, Spin count: 0
Mutex wait: 64.6%
Hash slots: 1009, Hash lengths (min/avg/max): 2/ 10/ 22
Remove node: 0, Insert queue: 0, Insert prior: 0
Owners (146): forward: 276280, backward: 7712804
Free owners (311): forward: 6661712, backward: 3680632
Free locks (11101): forward: 5729696, backward: 530828
Free requests (111995): forward: 10720548, backward: 1576828
Lock Ordering: Enabled

One of the things that I was reading up on was the Mutex Wait. There was a reference to a high level, but I really don't understand what is normal, and what is high. Compared to figures from our other servers (5 – 11%), this is high, and compared to the figures from the same server on another day (~11%), it is high. I have also checked the release notes of 2.5 and found that may or may not be (part of) the problem. I should note that the database has been moved to a different server between the above lock print and now to completely rule out hardware.

Is there anything `concerning' in the above lock print?
Are there configuration settings that may assist?
Is there further diagnostics I should be performing?
Should I be increasing the hash slots?
Should I be reducing the DefaultDbCachePages? (If so, what technique is used to work out a `reasonable' value?)
I can see the deadlock scans have occurred 29 times (but no deadlocks). This indicates that there have been certain requests waiting for over 10 seconds, but I am not sure whether this is a cause or a consequence. Is there anything I can check with relation to this?

Kind regards,