Subject Re: [firebird-support] Classic Server stops working overnight (socket errors?)
Author Helen Borrie
At 12:22 PM 27/03/2008, you wrote:
>Hi there,
>
>Our Classic Server stopped working overnight and we had to switch to
>Super Server to get it to work. Switching back to Classic Server
>brought up the same problems.
>
>Symptoms:
>* We've been running Classic Server for a long period of time
>without any real problems.
>
>* No problems on Monday, but the next day nobody could get into the
>Firebird database via ODBC. We tested with isql and same result.
>When I used a local connection string (filepath) I was able to get
>access to the Firebird database.
>
>* We restarted the server (twice).

Rebooted the host machine? or stopped and restarted the fb_inet_server service?


>* We did telnets from one other computer to the firebird server
>(port 3050) and that worked fine.
>
>* No Microsoft updates. No event log errors. We shut down the virus
>program, but that made no difference.
>
>* Evaluation of Firebird log file (see below) lead us to believe the
>errors were related to the socket layer (SO_KEEPALIVE, SO_NODELAY
>and inet errors).
>
>Versions:
>On Monday they were running version 2.0.1.12855
>On Tuesday we installed 2.0.3.12981 (same problems)
>ODBC 1.2.0.69 (same problem with isql though)
>Win 5.2.3790
>
>Workaround:
>We found a suggestion on a Spanish (!) website suggesting to run
>Firebird in Super Server mode (I can't quite remember how we got to
>that website though, as we were quite busy at the time <grin>). That
>worked. Switching back to Classic Server however gave the same
>problems. That particular customer has a multi-core CPU so it would
>definitely be nice if we could somehow run it in Classic Server mode
>again.

Merely "switching" without changing the cache size would cause problems all by itself.

Did you run the Guardian service while you had SS installed? did you removed it when you reverted to Classic?


>Our guess:
>Maybe an update of some program also updated the winsock dlls which
>in turn caused some problem in the socket library of the Classic
>Server? (Mind you we're just guessing here)

Neither Classic nor SS servers has a "socket library". The client dll uses whatever network transport is deduced from the connection string protocol. What transport layer do you access, i.e., what does your remote connection string look like?

>If anyone is interested in running a few tests and trying to get to
>the bottom of it, just let me know... (Our customer works pretty
>much round the clock with different shifts, so I might not be able
>to test everything immediately, but I'll do my best)

It doesn't sound like a familiar etiology at all so I can't think of anything else you could test, from a software POV. It has the appearance of an intermittent network fault, where "Working" vs "not working" becomes coincidental. There *is* an issue with isc_que_events that appeared in v.2.0.3 that affects you if you're using WNet protocol (\\hostname\d:\path\to\database) but it wasn't present in v.2.0.1.

With WNet protocol you would of course be "hobbled" as to the number of concurrent connections. Is it possible that an unusually high load, or a backlog of ghost connections, might have tipped things over the limit? (Though your log excerpts show that, at least at those points, you were using TCP/IP not WNet). Along the same lines, do you have any explicit limits in your TCP/IP setup that could have been tipped over by excessive extant connections?

Overall, if you were able to connect to SS through port 3050 then there's no reason why you wouldn't be able to connect through port 3050 to Classic, as long as the network didn't have a fault. It seems likely there is a factor there that you haven't uncovered yet - such as a router or the host's NIC playing up (giving the 10054 errors (connection reset by peer) and also inhibiting KEEPALIVE).

Did you try just reinstalling Classic v.2.0.1 at all?

./heLen