firebird-support - Re: Error reading data from the connection

Subject	Re: Error reading data from the connection
Author	Greg Kay
Post date	2005-11-30T23:37:08Z

Some extra information we got today. Looking at a Linux log there are
segfaults at the same time as the client read connection errors. The
segfault entries look like

Nov 30 09:49:35 dbserver kernel: fb_inet_server[14083]: segfault at
000000004847c008 rip 00000000556469bb rsp 00000000ffffc4cc error 4

Does this indicate the server is responding to a lost connection or
are the segfaults causing the lost connection?

Greg

--- In firebird-support@yahoogroups.com, Helen Borrie <helebor@t...>
wrote:

>
> At 04:36 AM 29/11/2005 +0000, you wrote:
>
> > > That said, you don't say anything about the server, either which
> >model or
> > > what kind of configuration you are running it on. It might just be
> >that
> > > 150 concurrent connections saps the server resources anyway, and you
> >have
> > > one particularly humongous query that is habitually the last

straw....

> > >
> > > ./heLen
> > >
> >
> >
> >We're using Classic 1.52 on a Linux 8 Gb dual CPU opteron 242. We have
> >no home-grown UDF's and the SQL that we get the "Error reading data
> >from the connection" problem is a simple short running SELECT. We have
> >also recently reduced the number of concurrent connections from about
> >300 to the 150 with a reduction in the average CPU load from about 30%
> >down to 20%. With the 300 connections we weren't getting these
> >connection errors. We get about 40 of these intermittent connection
> >errors in a day and on different client machines. Looking at our own
> >logs it seems that each connection problem occurs after a period of
> >inactivity on the client (anywhere from 5 minutes to an hour).
>
> What about TCP/IP timeouts on the Windows clients? Have you checked

that

> side of things out? Have some clients had patches/service packs

installed

> that broke the network setup? Is DHCP jumping in and stealing IP

addresses

> from quiet nodes?
>
> Has anyone changed the n/w configuration on the server?
>
>
>
> >Given the above extra info is it still a possibility that the server
> >might have glitches due to a crash or overloading
>
> "Server crash" isn't going to be an issue with Classic. A process

could

> crash, but there is no server to crash. Nor overloading, either, if

you

> weren't getting these errors with double the number of processes

running

> ,with exactly the same configuration and software, presumably? That

should

> make it easier to find. Look for anything you or anyone else changed
> between then and now.
>
> > and that are not reported in the server log?
>
> They probably are: you said you were getting 104 errors. Those come

from

> the network, essentially what the NOS reports when the xinetd checks on
> connections, i.e. processes (on Classic) that are meant to be there.

> something has gone missing, that's what is logged.
>
> When you said the problem occurred every time at the same point in your
> Delphi app - can you make it happen reproducibly? If so, I'd want

to put a

> monitor on the client to see exactly what happens before the

connection dies.

>
> You wrote:
> > We have no home-grown UDF's and the SQL that we get the "Error

reading

> data from the connection" problem is a simple short running SELECT.
>
> OK, if not home-gown UDFs, what about FreeUDFLib UDFs? FreeUDFLib

on Linux

> has a bit of a fraught history...*any* UDF calls in that simple short
> running SELECT?
>
> ./heLen
>