Subject FB SS 2.0.1 on Gentoo Hangs after 30 minutes and does not accept any connections
Author Stan
Hi All,

I am running the default EBuild of Firebird 2.0.1:

stan_gentoo # emerge --search firebird:
* dev-db/firebird
Latest version available: 2.0.1.12855.0-r4
Latest version installed: 2.0.1.12855.0-r4
Size of files: 22,571 kB
Homepage: http://firebird.sourceforge.net/
Description: A relational database offering many ANSI SQL-99
features
License: Interbase-1.0

on a new gentoo system:
stan_gentoo bin # uname -a
Linux stan_gentoo 2.6.21-gentoo-r4 #3 Sun Aug 5 01:46:07 PDT 2007 i686
AMD Athlon(tm) XP 2400+ AuthenticAMD GNU/Linux

The machine has 512mb Ram and a single IDE disk.

My database starts out empty, and the application inserts a bunch
of data using stored procedures. After 10 minutes to 1 hour of
intermittent high load on the database, the fbserver process hangs,
and stops accepting connections. I can't even isql to it from the
linux box where it's installed:

stan_gentoo firebird # isql localhost:data -user SYSDBA -pass masterkey
(following lines are printed after ~ 2 minutes...)
Statement failed, SQLCODE = -923
connection rejected by remote interface
Use CONNECT or CREATE DATABASE to specify a database
SQL>

If I "kill -9" the fbserver process and restart it, then
it works as though nothing happened and will hang again
after some time of intermittent high load.

My transactions are being commited and my connections are being closed.
I know this because the same application running on windows does not
hang and runs for weeks, but much slower :).
The application uses the C api (ibase.h), mostly prepared and some
un-prepared statements and transaction options:

static char isc_tpb[4] = {isc_tpb_version3,
isc_tpb_write,
isc_tpb_concurrency,
isc_tpb_wait};


I am setting sweep to 0, and doing manual "gfix sweep" every
5 minutes. I also tried using the default garbage collection
of firebird, but it hangs in the same way.

gstat output when fbserver hangs:

stan_gentoo firebird # gstat localhost:data -user SYSDBA -pass masterkey

Database "/home/stan/data.fdb"
Database header page information:
Flags 0
Checksum 12345
Generation 8132
Page size 16384
ODS version 11.0
Oldest transaction 8127
Oldest active 8128
Oldest snapshot 8128
Next transaction 8129
Bumped transaction 1
Sequence number 0
Next attachment ID 0
Implementation ID 19
Shadow count 0
Page buffers 0
Next header page 0
Database dialect 3
Creation date Sep 9, 2007 21:41:47
Attributes

Variable header data:
Sweep interval: 0
*END*


Database file sequence:
File /home/stan/data.fdb is the only file
(this line is printed after ~2 minutes)
connection rejected by remote interface

So there is no big gap in my transactions.

The Firebird log also does not reveal anything useful, the 104 and 111
errors occur after fbserver hangs. When fbserver hangs it is using 0%
CPU, but still has its full virtual memory. I am using the default
config file, and default DB options.

top output when fbserver hangs:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
29018 root 17 0 156m 6588 3488 S 0.0 1.4 4:15.88 fbserver

If I connect GBD to the fbserver process while it hangs I see
14 threads all blocked in __kernel_vsyscall:

(gdb) info threads
14 Thread -1234003056 (LWP 29021) 0xffffe410 in __kernel_vsyscall ()
13 Thread -1242461296 (LWP 29022) 0xffffe410 in __kernel_vsyscall ()
12 Thread -1293231216 (LWP 29039) 0xffffe410 in __kernel_vsyscall ()
11 Thread -1225147504 (LWP 29041) 0xffffe410 in __kernel_vsyscall ()
10 Thread -1327391856 (LWP 29873) 0xffffe410 in __kernel_vsyscall ()
9 Thread -1335981168 (LWP 29963) 0xffffe410 in __kernel_vsyscall ()
8 Thread -1344439408 (LWP 29969) 0xffffe410 in __kernel_vsyscall ()
7 Thread -1301623920 (LWP 30725) 0xffffe410 in __kernel_vsyscall ()
6 Thread -1318409328 (LWP 32057) 0xffffe410 in __kernel_vsyscall ()
5 Thread -1310016624 (LWP 4009) 0xffffe410 in __kernel_vsyscall ()
4 Thread -1352832112 (LWP 4010) 0xffffe410 in __kernel_vsyscall ()
3 Thread -1378010224 (LWP 4011) 0xffffe410 in __kernel_vsyscall ()
2 Thread -1361224816 (LWP 4012) 0xffffe410 in __kernel_vsyscall ()
1 Thread -1223653696 (LWP 29018) 0xffffe410 in __kernel_vsyscall ()
(gdb) bt
#0 0xffffe410 in __kernel_vsyscall ()
#1 0xb71138f6 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib/libpthread.so.0
#2 0xb71f34dd in pthread_cond_wait () from /lib/libc.so.6
#3 0x0806ea7a in ?? ()
#4 0x0807033d in ?? ()
#5 0x08070433 in ?? ()
#6 0x0820c9ec in ?? ()
#7 0x0821cc15 in ?? ()
#8 0x08058d87 in ?? ()
#9 0x0805375f in ?? ()
#10 0x0805b542 in ?? ()
#11 0xb714183c in __libc_start_main () from /lib/libc.so.6
#12 0x08052f81 in ?? ()
(gdb)thread 2
[Switching to thread 2 (Thread -1361224816 (LWP 4012))]#0 0xffffe410
in __kernel_vsyscall ()
(gdb) bt
#0 0xffffe410 in __kernel_vsyscall ()
#1 0xb7115f5e in __lll_mutex_lock_wait () from /lib/libpthread.so.0
#2 0xb7113eb5 in pthread_cond_broadcast@@GLIBC_2.3.2 () from
/lib/libpthread.so.0
#3 0xb71f3406 in pthread_cond_broadcast () from /lib/libc.so.6
#4 0xfc438dff in ?? ()
#5 0x00218ee8 in ?? ()
#6 0x8de3eb00 in ?? ()
#7 0x84e8fc43 in ?? ()
#8 0xeb000021 in ?? ()
#9 0xffffba9a in ?? ()
#10 0x01b97fff in ?? ()
#11 0xb8000000 in ?? ()
#12 0x000000f0 in ?? ()
#13 0x1015ff65 in ?? ()
#14 0xeb000000 in ?? ()
#15 0x909090ad in ?? ()
#16 0x90909090 in ?? ()
#17 0x55909090 in ?? ()
#18 0x558be589 in ?? ()
#19 0x08458b0c in ?? ()
#20 0x00c7d285 in ?? ()
#21 0x00000000 in ?? ()

(gdb) thread 3
[Switching to thread 3 (Thread -1378010224 (LWP 4011))]#0 0xffffe410
in __kernel_vsyscall ()
(gdb) bt
#0 0xffffe410 in __kernel_vsyscall ()
#1 0xb71138f6 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib/libpthread.so.0
#2 0xb71f34dd in pthread_cond_wait () from /lib/libc.so.6
#3 0x0806ea7a in ?? ()
#4 0x0807033d in ?? ()
#5 0x08070433 in ?? ()
#6 0x081ab6f5 in ?? ()
#7 0x08072d12 in (anonymous namespace)::threadStart ()
#8 0xb710f4ab in start_thread () from /lib/libpthread.so.0
#9 0xb71e7f0e in clone () from /lib/libc.so.6

threads 4 - 13 backtrace's look about the same as thread 3.

(gdb) thread 14
[Switching to thread 14 (Thread -1234003056 (LWP 29021))]#0
0xffffe410 in __kernel_vsyscall ()
(gdb) bt
#0 0xffffe410 in __kernel_vsyscall ()
#1 0xb71157de in sem_wait@GLIBC_2.0 () from /lib/libpthread.so.0
#2 0x080530a8 in ?? ()
#3 0x08072d12 in (anonymous namespace)::threadStart ()
#4 0xb710f4ab in start_thread () from /lib/libpthread.so.0
#5 0xb71e7f0e in clone () from /lib/libc.so.6

Firebird.log:

stan_gentoo (Client) Fri Sep 7 00:47:00 2007
/opt/firebird/bin/fbguard: guardian starting bin/fbserver

stan_gentoo (Client) Fri Sep 7 01:22:29 2007
INET/inet_error: connect errno = 111


stan_gentoo (Client) Fri Sep 7 01:22:32 2007
INET/inet_error: connect errno = 111


stan_gentoo (Client) Fri Sep 7 01:35:40 2007
INET/inet_error: read errno = 104


stan_gentoo (Client) Fri Sep 7 01:35:40 2007
INET/inet_error: read errno = 104


stan_gentoo (Client) Fri Sep 7 01:35:40 2007
INET/inet_error: read errno = 104


stan_gentoo (Client) Fri Sep 7 01:35:40 2007
INET/inet_error: read errno = 104


Please let me know if you have any ideas about whats going on
or if you require extra information. I am planning on getting
a binary build of firebird from firebirdsql.org and trying that.
After that I am planning to get a binary debug build of firebird
and seeing if the backtraces provide any more information.

Thank you in advance,

-Stan