Subject | Re: Error reading data from the connection |
---|---|
Author | Greg Kay |
Post date | 2005-11-30T23:37:08Z |
Some extra information we got today. Looking at a Linux log there are
segfaults at the same time as the client read connection errors. The
segfault entries look like
Nov 30 09:49:35 dbserver kernel: fb_inet_server[14083]: segfault at
000000004847c008 rip 00000000556469bb rsp 00000000ffffc4cc error 4
Does this indicate the server is responding to a lost connection or
are the segfaults causing the lost connection?
Greg
--- In firebird-support@yahoogroups.com, Helen Borrie <helebor@t...>
wrote:
segfaults at the same time as the client read connection errors. The
segfault entries look like
Nov 30 09:49:35 dbserver kernel: fb_inet_server[14083]: segfault at
000000004847c008 rip 00000000556469bb rsp 00000000ffffc4cc error 4
Does this indicate the server is responding to a lost connection or
are the segfaults causing the lost connection?
Greg
--- In firebird-support@yahoogroups.com, Helen Borrie <helebor@t...>
wrote:
>straw....
> At 04:36 AM 29/11/2005 +0000, you wrote:
>
> > > That said, you don't say anything about the server, either which
> >model or
> > > what kind of configuration you are running it on. It might just be
> >that
> > > 150 concurrent connections saps the server resources anyway, and you
> >have
> > > one particularly humongous query that is habitually the last
> > >that
> > > ./heLen
> > >
> >
> >
> >We're using Classic 1.52 on a Linux 8 Gb dual CPU opteron 242. We have
> >no home-grown UDF's and the SQL that we get the "Error reading data
> >from the connection" problem is a simple short running SELECT. We have
> >also recently reduced the number of concurrent connections from about
> >300 to the 150 with a reduction in the average CPU load from about 30%
> >down to 20%. With the 300 connections we weren't getting these
> >connection errors. We get about 40 of these intermittent connection
> >errors in a day and on different client machines. Looking at our own
> >logs it seems that each connection problem occurs after a period of
> >inactivity on the client (anywhere from 5 minutes to an hour).
>
> What about TCP/IP timeouts on the Windows clients? Have you checked
> side of things out? Have some clients had patches/service packsinstalled
> that broke the network setup? Is DHCP jumping in and stealing IPaddresses
> from quiet nodes?could
>
> Has anyone changed the n/w configuration on the server?
>
>
>
> >Given the above extra info is it still a possibility that the server
> >might have glitches due to a crash or overloading
>
> "Server crash" isn't going to be an issue with Classic. A process
> crash, but there is no server to crash. Nor overloading, either, ifyou
> weren't getting these errors with double the number of processesrunning
> ,with exactly the same configuration and software, presumably? Thatshould
> make it easier to find. Look for anything you or anyone else changedfrom
> between then and now.
>
> > and that are not reported in the server log?
>
> They probably are: you said you were getting 104 errors. Those come
> the network, essentially what the NOS reports when the xinetd checks onIf
> connections, i.e. processes (on Classic) that are meant to be there.
> something has gone missing, that's what is logged.to put a
>
> When you said the problem occurred every time at the same point in your
> Delphi app - can you make it happen reproducibly? If so, I'd want
> monitor on the client to see exactly what happens before theconnection dies.
>reading
> You wrote:
> > We have no home-grown UDF's and the SQL that we get the "Error
> data from the connection" problem is a simple short running SELECT.on Linux
>
> OK, if not home-gown UDFs, what about FreeUDFLib UDFs? FreeUDFLib
> has a bit of a fraught history...*any* UDF calls in that simple short
> running SELECT?
>
> ./heLen
>