Subject Re: [firebird-support] Re: Another "connection forcibly closed by the remote host" problem
Author Helen Borrie
At 02:53 AM 28/06/2006, you wrote:
>--- In firebird-support@yahoogroups.com, Helen Borrie <helebor@...> wrote:
> >
> > It's not a Firebird bug but a Windows one. So use DPI at your own
> > peril. When next you post a message with the subject: "Unexplained
> > server crash", or "Dead connections don't go away", don't forget to
> > mention that you applied this workaround to avoid researching the
> > actual reason that the socket keepalive function is timing out early
> > in your environment....
> >
> > ./heLen
> >
>
>Helen - If you've read or followed this thread at all for the past
>month, you should know that I've done little else in that time besides
>work to pin-point what is causing this problem so that I could report
>the solution back here.

Sure, I've been following it. I noticed that for at least part of
that period you were popping in pieces of a problem description
without (initially) knowing that that VMWare was in the
picture...I've seen people offer suggestions and observations and
I've seen a lot of "It doesn't work for me...."

>So please don't come at me with "you applied
>this workaround to avoid researching the actual reason that the socket
>keepalive function is timing out early in your environment..." That's
>out of line, and is beneath you. I've done almost nothing for the past
>5 weeks but research the actual reason.

Have you seen this paper?

http://www.vmware.com/support/gsx25/doc/network_nat_gsx.html

At that page you can also subscribe to a gsx user group, which is
where I suspect you will find your ultimate answer for getting around
gsx's default timeout.

Some other papers that might be relevant are:
http://nutss.gforge.cis.cornell.edu/pub/imc05-tcpnat.pdf
http://www.aspfaq.com/show.asp?id=2544
and even possibly this:
http://www.portlock.com/documents/Manually_Setting_up_TCP.pdf

>And despite all the other
>helpful advice and suggestions I've gotten from all the experts here,
>Dimitry's workaround is the only thing that has worked. I have at
>least managed to pin down the configuration where the problem occurs.

"Dmitry's workaround" is actually the way that socket keepalive was
implemented in Fb 1.0.x and, before that, in InterBase. It's really
no more than a ping-like trick. It was a frequent cause of problems
on Windoze, hence the reimplementation in Fb 1.5 and onward to use
the proper TCP/IP keepalive setting - to avert the problems!! So, by
re-enabling it, you're getting the pingish polling from the server
but you're reintroducing a source of *other* problems.

The point of my comment was that - at least from where I'm sitting -
the problem you really need to solve is how to manage the VMWare
server's timeout. The right way to research this isn't by beating
the Firebird support list to death but by asking the right question
in the right place - the VMWare forums.

>I have tested virtually every combination of
>single/multiple/Xeon/Pentium processors, Win2000/Win2003,
>VMWare/Non-virtual server, and FB versions 1.0.3/1.5.2/1.5.3/2.0, and
>the ONLY combinations where the disconnections occur is on a multiple
>Xeon physical machine, Linux OS hosting a VMWare ESX VM configured to
>use one CPU, with Win2003 OS running in the VM, with Firebird v1.5.2,
>v1.5.3, or 2.0. Like I've said before in this thread, I can't find the
>1.5.0 or 1.5.1 install kits to test them. If someone knows where I can
>find those, let me know so I can determine the earliest FB version
>that is affected.

That's the easy one: http://prdownloads.sourceforge.net/firebird

./heLen