firebird-support - Re: [firebird-support] Re: Maximum number of sockets

Subject	Re: [firebird-support] Re: Maximum number of sockets
Author	Helen Borrie
Post date	2006-03-23T03:39:29Z

At 11:21 AM 23/03/2006, you wrote:

>We recently ran into this problem after a server restart.
>
>As part of our research into why clients were timing out even though
>the service was still aparently alive - I had a bit of a scan through
>the source (I'm no expert). It appears that superserver tries to (or
>used to try to) validate packets before use - hence, if the check is
>valid, shouldn't really get to the receive error on the main port:
>
>"SRVR_multi_thread/RECEIVE: error on main_port, shutting down"
>
>Is there an easy answer to this (I thought I'd ask here before
>bothering the dev list) - ie the code appears to stop receiving on
>the main port without shutting down.

Yes, there is. It has to do with what happens when a client crashes
its connection in the middle of sending a large packet to the server
and it is specific to Superserver. A "large packet" is typically a
blob or a long SQL statement that is larger than the packet size that
the network transport uses. The server knows ahead what size the
*database packet* is, so it keeps receiving *network packets* until
it has received full database packet.

The way the Superserver is in Fb 1.5 and all stops before, back to
when SS was first implemented in IB 4.1, it will just keep waiting
for network packets until it has received the full database
packet. The only thing that will stop it will be when, eventually,
the broken socket times out (that's by Linux and Windows defaults, 2
hours after the first keepalive packet is sent and gets no response
back). Meanwhile, all other requests from all other sockets will be
neglected and will, themselves, time out. Hence, the whole server is
effectively frozen by the action of this one misbehaving connection.

For Firebird 2 Alex P. has refactored this loop so that these
incomplete packet-receives are recognised sooner as abandoned
connections and are removed from the loop. Up to and including
1.5.3, we have to take whatever measures we can to avoid the
misbehaviour, i.e. treat the symptoms if we can't avoid the
cause. As far as I know, the only way to get around an environment
that can't avoid it, e.g. bad site discipline, or users passing large
database packets across a slow connection, is to use Classic.

./heLen