firebird-support - my server's network (eth0) goes down during stress tests.

Subject	my server's network (eth0) goes down during stress tests.
Author	dirknaudts
Post date	2005-02-20T15:27:23Z

ServerConfig :
FB 1.5 SS (FirebirdSS-1.5.1.4481-0.i686.rpm)
on Redhat 9 2.4.20-28.9 smp i686
(NPTL disabled with LD_ASSUME_KERNEL=2.5.2 as in, rel.notes)

DB is dialect 3. DB uses only 2 UDF's (lpad and ascii_ char, oth from
id_udf)

All client apps on Win machines, written in D7 using IBO 4.3A,
connecting to DB with xxx.xxx.xxx.xxx:/path and all using fbclient.dll

All pc's (including server) are connected over 1 switch D-Link
DES1024 R+ which should be a good switch)

DB tool used to access DB's directly : IBExpert 2004.12.24

Situation:
- I put some serious stress on my DB which means running about 10
apps at the same time, some inserting data, others fetching and
deleting it. (this process runs automatically and continious)
- I monitoring DB statistics from time to time (through IBExpert
services, database statistics, header info) to verify wether my
transaction control is ok.

Problem:
at a given moment all apps loose DB-connection, and the server can't
even be pinged anymore over network. (Pinging between other stations
in same network still ok, so switch doesnt' seem to be the problem)

Working directly on the server it can ping itself, but can't ping any
other address. isql connecting to local database over server's IP
address says it can't connect to db.

Easiest way to try to 'reproduce' it it by clicking retrieve
statistics fast enough, but I've seen it 'crash' without retrieving
statistics as well.

frequency:
it sometimes runs ok for several hours, and sometimes only 10 minutes.

Observations:
In firebird.log I see a lot of
INET/inet_error:read errno = 104
INET/inet_error:read errno = 9
INET/inet_error:read errno = 111
INET/inet_error:select in packet_receive errno = 9

and one time:
internal gds software consistency check (invalid send request (167))

every time I see a 111 error, I see a log entry indication server
restart by guardian, so I guess the last entry in the log file before
the crash at network level is errno = 111.

I'm really puzzled as to what's causing this behaviour. As I believe
that whatever bug I might still have in the software (or database)
How is it possible that the entire server's network goes down ? (eth0)

Has anybody seen anything similar ?
any hints welcome...

Thanks

best regards,
Dirk Naudts.