Subject Re: Firebird SS-1.5.1 and RedHat ES 4 troubles
Author axsp2000
--- In firebird-support@yahoogroups.com, "axsp2000" <a.spinetti@...>
wrote:
>
> --- In firebird-support@yahoogroups.com, Helen Borrie <helebor@> wrote:
> >
> > At 06:56 PM 17/02/2006, you wrote:
> >
> > >Helen,
> > > follows what reported from firebird.log between 2 server restart:
> > >
> > >saert2.unit.net (Client) Wed Feb 15 04:08:54 2006
> > > /opt/firebird/bin/fbguard: guardian starting bin/fbserver
> > >
> > >
> > >saert2.unit.net (Server) Wed Feb 15 04:09:52 2006
> > > INET/inet_error: select in packet_receive errno = 9
> > >
> > >saert2.unit.net (Server) Wed Feb 15 06:33:35 2006
> > > INET/inet_error: select in packet_receive errno = 9
> > >
> > >saert2.unit.net (Server) Wed Feb 15 07:12:50 2006
> > > INET/inet_error: read errno = 9
> > >
> > >saert2.unit.net (Server) Wed Feb 15 07:53:24 2006
> > > INET/inet_error: select in packet_receive errno = 9
> > >
> > >saert2.unit.net (Server) Wed Feb 15 09:18:00 2006
> > > INET/inet_error: read errno = 104
> > >
> > >saert2.unit.net (Server) Wed Feb 15 09:18:00 2006
> > > INET/inet_error: read errno = 104
> > >
> > >saert2.unit.net (Server) Wed Feb 15 09:44:39 2006
> > > INET/inet_error: read errno = 9
> > >
> > >saert2.unit.net (Server) Wed Feb 15 13:19:14 2006
> > > INET/inet_error: read errno = 9
> > >
> > >saert2.unit.net (Server) Wed Feb 15 15:56:55 2006
> > > INET/inet_error: read errno = 9
> >
> > So far, the server is losing contact with clients occasionally
> > through the day. An application crashing or timing out, or just
> > blips on the network, could cause these interferences. Are they
> > using wireless?
> >
> > >saert2.unit.net (Client) Wed Feb 15 19:42:58 2006
> > > INET/inet_error: connect errno = 111
> >
> > Now a local client wants to connect but the server is unavailable.
> >
> >
> > >saert2.unit.net (Client) Wed Feb 15 19:42:58 2006
> > > /opt/firebird/bin/fbguard: guardian starting bin/fbserver
> >
> > Guardian is a watchdog program that restarts the server after it has
> > crashed. In this case, Guardian has been kicked into life by a
> local client.
> >
> > >Any idea ?
> >
> > Not really. Something is crashing your client threads (3-finger
> > salute?). At some point after 4 pm, the server process itself
> > crashes. Have you looked at the system log for nasty words like
> SEGFAULT?
> >
> > Incidentally, that kernel bug I thought I sort-of remembered
> > yesterday was a red herring. It's one that affects dual-core
> > Opterons, where a bug in a memory management subsystem was causing
> > segfaults in Classic processes on that hardware and, as I recall, was
> > fixed with a kernel patch.
> >
> > If this was me at this stage, I would be ready to take the
> > conservative approach: uninstall the 1.5.3 NPTL package, replace it
> > with the 1.5.3 "old threading" package and set the
> > LD_ASSUME_KERNEL=2.2.5 in both of the places suggested in the release
> > notes. That should at least get you back to a stable threading
> situation.
> >
> > In reviewing your original posting, I see this:
> > "The only DDL task that the application accessing the db
> > does, is to create/drop dinamically some very simple computed fields
> > in a couple of tables. But this not happens very often."
> >
> > - how often? once a year? once a month? once a week? every 2-3
> > days? more often sometimes?
> > - do these computed fields involve UDFs?
> > - WHY is your application code performing dynamic DDL at all?
> >
> > I don't know what else to look at. I've got 1.5.2 SS without NPTL
> > support running on Mandriva 2005 on an AMD Sempron 2200 (no
> > hyperthreading) and it's trouble-free.
> >
> > Hopefully someone has been where you are and can throw some light.
> >
> > ./heLen
> >
>
>
> Helen,
> checked the log, no SEGFAULT anywhere.
> the firebird server is freezed.
> with fbmgr command "shut" timeout 2 minutes, and message :
> "unable to complete network request to host 'localhost'
> failed to establish a connection
> can not attach to server"
>
> No way to restart it by the fbmgr. Killing fbserver manually solve
> ther problem.
>
> Run gstat on a database follow the result after a couple o minute of
> waiting:
> Database "/DATABASE/WELL_DBS/Remsa_01.gdb"
>
> Database header page information:
> Flags 0
> Checksum 12345
> Generation 92693
> Page size 4096
> ODS version 10.1
> Oldest transaction 91779
> Oldest active 91780
> Oldest snapshot 58861
> Next transaction 92686
> Bumped transaction 1
> Sequence number 0
> Next attachment ID 0
> Implementation ID 19
> Shadow count 0
> Page buffers 0
> Next header page 0
> Database dialect 3
> Creation date Jul 26, 2005 13:14:53
> Attributes
>
> Variable header data:
> Sweep interval: 20000
> *END*
>
>
> Database file sequence:
> File /DATABASE/WELL_DBS/Remsa_01.gdb is the only file
>
> Database log page information:
> Creation date
> Log flags: 2
> No write ahead log
>
> Next log page: 0
>
> Variable log data:
> Control Point 1:
> File name:
> Partition offset: 0 Seqno: 0 Offset: 0
> Control Point 2:
> File name:
> Partition offset: 0 Seqno: 0 Offset: 0
> Current File:
> File name:
> Partition offset: 0 Seqno: 0 Offset: 0
> *END*
> Unable to complete network request to host "localhost".
> -Failed to establish a connection.
>
> With "top", following results :
> VIRT RES SHR %CPU TIME
> fbguard 3340 1296 S 0 0:0:0
> fbserver 325m 28m S 0 2:21:68
> 73 tasks running
> 72 tasks sleeping
> 0 stopped
> 0 zombie
> CPU 1.0% us 4.4% sy 0.0% ui 90.5% id 0.0% wa 0.7%
> hi 2.0% si
> MEM 256044k total 254040k used 2008k free 27264k
buffer
> SWA 265064k total 192k used 264872k free 150368k
cached
>
> The computed fields uses UDF but they're crated/dropped once a month
> usually. Sometimes more often but this is not the case.
>
> WHY is your application code performing dynamic DDL at all?
> Because sometimes we require to show some computation on some reports
> to a lot of client and this was the simplest and quickest way without
> involving other application layer.
>
> We'll try with the non NPTL version according your suggestion.
>
> Thank you
>
> Alessandro
>

Helen,
following your suggestion of installing non nptl version and setup
the LD_ASSUME_KERNEL variable as suggested into the documentation,
fixes the problem. So it seems that some uncompatibility issue between
FirebirdSS 1.5.1/1.5.3 nptl and RedHat ES4 exists.

Thank you again

Alessandro