Subject Firebird stop responding - more information
Author mivi71dk
Hi

Before christmas we had some situations on 1 of our servers, where Firebird stopped responding.
I posted here to get more help/information, but didn't really come to a solution.

The server is this:
-
server with 4 kernels, where 3 kernels a disabled.

-
Linux version 2.6.18-6-amd64 (Debian 2.6.18.dfsg.1-26etch1) (dannf@...) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Thu Nov 5 03:49:04 UTC 2009

-
firebird 2.1.3 superserver 64-bit
Downloaded from here:
http://sourceforge.net/projects/firebird/files/firebird-linux-amd64/2.1.3-Release/FirebirdSS-2.1.3.18185-0.amd64.tar.gz/download


Within 2-3 days before christmas we experienced 3-4 situations where the server stopped responding. No one could connect, no one could do anything and the only solution was the kill the firebird process.
After this it worked again.

What we have done so far is:

- A complete backup / restore cycle.
- Upgraded the linux (because of what I write below regarding FUTEX)

On this server is 2 DBs running.
1 is a very small one containing 1 table with 52 records.
This one is only updated when we create a new store, which we did last time in october. All other situations we just read in this DB.

The other is a 25 Gb DB. It has some 100 connections.
When the server stoppes it happens when there is some rather haevy use of the DB.

My colleque has spent some time googling around with this.
He has found some problems regarding FUTEX, which as we can see it has something to do with threads.
MySQL has had some this according to these links:

http://bugs.mysql.com/bug.php?id=25232
http://bugs.mysql.com/bug.php?id=29560
http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.10-rc1/2.6.10-rc1-mm5/broken-out/futex_wait-fix.patch

December 28. around 18.20 the server stopped once more.
The firebird was using some 15% CPU.
A trace gave this back

futex(0x2aaab5bfeb2c, FUTEX_WAIT, 3130909, NULL <unfinished ...>

There are nothing in the firebird log or the linux logs.

As of now we believe that there somewhere inside Firebird is a problem, which gives this situation. It really seems as if Firebird enters a loop that I cannot get out of.