firebird-support - Re: [firebird-support] Re: Firebird SS-1.5.1 and RedHat ES 4 troubles

Subject	Re: [firebird-support] Re: Firebird SS-1.5.1 and RedHat ES 4 troubles
Author	Helen Borrie
Post date	2006-02-17T11:21:11Z

At 06:56 PM 17/02/2006, you wrote:

>Helen,
> follows what reported from firebird.log between 2 server restart:
>
>saert2.unit.net (Client) Wed Feb 15 04:08:54 2006
> /opt/firebird/bin/fbguard: guardian starting bin/fbserver
>
>
>saert2.unit.net (Server) Wed Feb 15 04:09:52 2006
> INET/inet_error: select in packet_receive errno = 9
>
>saert2.unit.net (Server) Wed Feb 15 06:33:35 2006
> INET/inet_error: select in packet_receive errno = 9
>
>saert2.unit.net (Server) Wed Feb 15 07:12:50 2006
> INET/inet_error: read errno = 9
>
>saert2.unit.net (Server) Wed Feb 15 07:53:24 2006
> INET/inet_error: select in packet_receive errno = 9
>
>saert2.unit.net (Server) Wed Feb 15 09:18:00 2006
> INET/inet_error: read errno = 104
>
>saert2.unit.net (Server) Wed Feb 15 09:18:00 2006
> INET/inet_error: read errno = 104
>
>saert2.unit.net (Server) Wed Feb 15 09:44:39 2006
> INET/inet_error: read errno = 9
>
>saert2.unit.net (Server) Wed Feb 15 13:19:14 2006
> INET/inet_error: read errno = 9
>
>saert2.unit.net (Server) Wed Feb 15 15:56:55 2006
> INET/inet_error: read errno = 9

So far, the server is losing contact with clients occasionally
through the day. An application crashing or timing out, or just
blips on the network, could cause these interferences. Are they
using wireless?

>saert2.unit.net (Client) Wed Feb 15 19:42:58 2006
> INET/inet_error: connect errno = 111

Now a local client wants to connect but the server is unavailable.

>saert2.unit.net (Client) Wed Feb 15 19:42:58 2006
> /opt/firebird/bin/fbguard: guardian starting bin/fbserver

Guardian is a watchdog program that restarts the server after it has
crashed. In this case, Guardian has been kicked into life by a local client.

>Any idea ?

Not really. Something is crashing your client threads (3-finger
salute?). At some point after 4 pm, the server process itself
crashes. Have you looked at the system log for nasty words like SEGFAULT?

Incidentally, that kernel bug I thought I sort-of remembered
yesterday was a red herring. It's one that affects dual-core
Opterons, where a bug in a memory management subsystem was causing
segfaults in Classic processes on that hardware and, as I recall, was
fixed with a kernel patch.

If this was me at this stage, I would be ready to take the
conservative approach: uninstall the 1.5.3 NPTL package, replace it
with the 1.5.3 "old threading" package and set the
LD_ASSUME_KERNEL=2.2.5 in both of the places suggested in the release
notes. That should at least get you back to a stable threading situation.

In reviewing your original posting, I see this:
"The only DDL task that the application accessing the db
does, is to create/drop dinamically some very simple computed fields
in a couple of tables. But this not happens very often."

- how often? once a year? once a month? once a week? every 2-3
days? more often sometimes?
- do these computed fields involve UDFs?
- WHY is your application code performing dynamic DDL at all?

I don't know what else to look at. I've got 1.5.2 SS without NPTL
support running on Mandriva 2005 on an AMD Sempron 2200 (no
hyperthreading) and it's trouble-free.

Hopefully someone has been where you are and can throw some light.

./heLen