firebird-support - Firebird Recurring Hourly Load

Subject	Firebird Recurring Hourly Load
Author	David Hollings
Post date	2016-11-16T13:32:34Z

Hello All,

We've been experience for some time now instances of very high load that
lasts for a couple of seconds every 60 mins. Load which averages around 2 or
3 on an 8 core server will briefly spike to above 30 or 40.
Running Centos 6.x/7.x and Firebird 2.5.x classic it appears that exactly
every hour every firebird process wakes up for a couple of seconds.

For 2-3 seconds the server will be unresponsive with high IO wait and high
system calls - each process will have a thread in D state. A huge increase
in context switches is also noticed.

We have reproduced this on an idle test server running Centos 6.6 (kernel
2.6.32) with glibc 2.12 and debug build LI-V2.5.6.27020 with two firebird
connections.

Strace -c on the waking thread of one process logs the following:

% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.000184 0 10258 1939 futex
0.00 0.000000 0 1 utime
0.00 0.000000 0 1 1 restart_syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.000184 10260 1940 total

A backtrace of the thread is below:

#0 0x0000003331e0da21 in sem_timedwait () from /lib64/libpthread.so.0
#1 0x00007f2b30490810 in Firebird::SignalSafeSemaphore::tryEnter
(this=0x7f2b2e3fe5f0, seconds=<value optimized out>, milliseconds=<value
optimized out>) at ../src/common/classes/semaphore.cpp:171
#2 0x00007f2b3037ef41 in Jrd::ConfigStorage::touchThreadFunc
(this=0x7f2b2e3fbc78) at ../src/jrd/trace/TraceConfigStorage.cpp:352
#3 0x00007f2b3037f049 in Jrd::ConfigStorage::touchThread
(arg=0x7f2b2e3fe5f0) at ../src/jrd/trace/TraceConfigStorage.cpp:338
#4 0x00007f2b301788e2 in run (arg=0x7f2b2e5fe860) at
../src/jrd/ThreadStart.cpp:128
#5 (anonymous namespace)::threadStart (arg=0x7f2b2e5fe860) at
../src/jrd/ThreadStart.cpp:139
#6 0x0000003331e079d1 in start_thread () from /lib64/libpthread.so.0
#7 0x0000003331ae89dd in clone () from /lib64/libc.so.6

At the exact same time the other process will also wake and strace logs the
following over a period of ~1.5s:

[pid 17268] 10:41:00.000270 <... futex resumed> ) = -1 ETIMEDOUT (Connection
timed out)

.....
The below is repeated 6381 times:

[pid 17268] 10:41:00.000375 futex(0x7f73d439f010, FUTEX_WAIT, 2147509580,
NULL) = -1 EAGAIN (Resource temporarily unavailable)

The below is repeated 1943 times:

[pid 17268] 10:41:00.000428 futex(0x7f73d439f010, FUTEX_WAKE, 1) = 0

.....
[pid 17268] 10:41:01.446739 futex(0x7f73d417d5f0,
FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {1479289261, 446729000},
ffffffff <unfinished ...>

Only one process ever manages to touch the trace file (at 09:41 it was
thread 17268 and at 10:41 it was thread 25932):

[pid 17268] 09:41:00.000036 utime("/tmp/firebird/fb_trace_vQb00k", NULL) = 0

It looks to me like the purpose of touchThreadFunc is simply to call
SetFileTime for the trace file in tmp every hour.

Is there a reason why the processes are synchronised to wake at the same
moment every hour?
Why does each process attempt to set the last access time of the trace file
in tmp and are there any ideas as to what is causing the contention?

Thanks and regards,
David