Subject Re: [firebird-support] connection refused, fbserver processes rising
Author Helen Borrie
At 01:43 AM 21/03/2006, you wrote:
>Hello all,
>
>last week we experienced some strange problems with our firebird
>server. Any connections were refused while the number of fbserver
>processes is rising. Details:
>
>- FirebirdSS-1.5.2.4731-0 running on SuSE Standard Server 8 with
>kernel k_smp-2.4.21-273
>- Clients are various selfmade applications written in Borland Delphi
>
>
>No connection possible. So again I did "rcfirebird stop" and "killall
>fbserver". Here's the part of firebird.log from startup on saturday to
>"killall" on monday morning:
>
>samba1 (Client) Sat Mar 18 17:31:48 2006
> /opt/firebird/bin/fbguard: guardian starting bin/fbserver
>samba1 (Client) Mon Mar 20 07:59:54 2006
> INET/inet_error: read errno = 104
>samba1 (Client) Mon Mar 20 07:59:54 2006
> INET/inet_error: receive in try_connect errno = 104
>samba1 (Client) Mon Mar 20 08:00:53 2006
> /opt/firebird/bin/fbguard: bin/fbserver terminated abnormally (-1)
>
>Nothing in /var/log/messages or /var/log/warn.
>
>We have some processes on different clients that try to connect to the
>database every few minutes: nagios and two of our own Delphi apps.
>Something seems to go wrong in a way that every db connection gets
>refused and each connection attempt leaves an fbserver process in memory.
>
>My questions:
>- What does the error code 104 in the log mean? I also find some
>errno=4 and 9 in the logs. On firebird.sourceforge.net I could only
>find the SQL error table. Where's the documentation for the error
>codes in the logfile?

The INET errors are TCP/IP errors. 104 means that either the server
crashed or a client crashed. Given the number of fbserver threads
that are running when problems start occurring, and the fact that the
problems started only recently, I would suspect a Denial of Service
attack. You're starting to see network errors occurring once
network, resource and Fbserver limits are reached. ~1000 concurrent
connections is beyond the practical limits for SS.

The single-digit error codes are coming from SuSE. You should be
able to find what's causing them by studying the manual.

Let's suppose that it's not your Delphi client applications
themselves that are causing this problem...considering that you
didn't have such problems before, throughout 2 years of usage.

Because you are using TCP/IP, not all of those fbserver threads are
necessarily client attachments. The server creates threads for
various tasks of its own: garbage collection and some other
stuff. And the server can allocate new attachments to a thread that
is still "alive" and has nothing to do. However, if the number of
fbserver threads is a lot higher than the actual number of connected
users then *something* naughty is going on.

If it is malicious, think about a recently-fired employee, especially
if you didn't change the passwords when s/he left. Or an unhappy
person in-house.

If it is accidental, then look for recently-hired users who have
access to the database to run ad hoc queries or network hardware that
is subject to intermittent faults (allowing a user to connect but
losing the connection intermittently...Maybe a user who just started
using a wireless LAN connection? or one who "cures" a slow query by
crashing out of whatever query tool he is using? Several users with
new machines/recent Windows upgrades, where faulty TCP/IP setup is
dropping the LAN connection when the user is accessing the Internet?

Basically, abnormally dropped connections take 2 hours to time
out. Worse, if a connection is dropped while the request-processing
subsystem is waiting to receive the balance of a packet that is
larger than network packet size, e.g. a blob or a long SQL query,
then the whole system will seem to freeze and will stay that way
until keepalive has kicked in and the (timeout period - time of last
keepalive packet received) becomes zero.

Finally, was Nessus added to the picture anywhere recently? Nessus
is not friendly to Firebird 1.5.

./heLen