Subject | select bug on Linux!! |
---|---|
Author | Brad Pepers |
Post date | 2001-05-03T06:27:27Z |
In looking for the delay problem I have, I started with thinking its a common
problem on Linux systems. The problem is that Linux select() calls work
differently than just about every other Unix system. The difference is that
the select call will adjust the passed in timeout value to take into account
the elapsed time. Common code for doing a select looks like this:
struct timeval timeout;
timeout.tv_sec = 60;
timeout.tv_usec = 0;
for (;;) {
int result = select(max_handle, read_set, write_set,
error_set, &timeout);
... various checks ...
}
The problem is that every time the select returns, the timeout value is
smaller until it eventually reaches zero and is no longer acting as a timeout
at all.
The reason I brought this up is that the code in remote/inet.c does a very
similar loop to the above in the packet_receive code. It sets up the timeout
outside of the loop and then loops waiting for packets. After the first
timeout though, its going to no longer be doing a timeout and instead will be
in a fairly busy loop instead. As long as it doesn't receive anything and
get out of the loop, it will keep in the busy loop.
The problem is compounded by the code that is trying to check if the clients
are alive. It does this by sending a dummy request to the client after every
timeout. If the client is around, it will just ignore it. If its dead,
there will be an error. But after the first timeout, this becomes a constant
bombardment on the clients since it is in a busy loop with a zero timeout!
What I think my problem is is that the client gets bogged down by the
constant stream of dummy keep-alive type requests this causes and thats why
it takes so long to respond.
So at least this one place in the code should be fixed. I sadly don't have a
build environment to test this but I hope someone else can soon since I think
its a pretty major bug! Also a general review of the use of select in
Firebird should be done in case there are other places with the same problem
on Linux.
--
Brad Pepers
brad@...
problem on Linux systems. The problem is that Linux select() calls work
differently than just about every other Unix system. The difference is that
the select call will adjust the passed in timeout value to take into account
the elapsed time. Common code for doing a select looks like this:
struct timeval timeout;
timeout.tv_sec = 60;
timeout.tv_usec = 0;
for (;;) {
int result = select(max_handle, read_set, write_set,
error_set, &timeout);
... various checks ...
}
The problem is that every time the select returns, the timeout value is
smaller until it eventually reaches zero and is no longer acting as a timeout
at all.
The reason I brought this up is that the code in remote/inet.c does a very
similar loop to the above in the packet_receive code. It sets up the timeout
outside of the loop and then loops waiting for packets. After the first
timeout though, its going to no longer be doing a timeout and instead will be
in a fairly busy loop instead. As long as it doesn't receive anything and
get out of the loop, it will keep in the busy loop.
The problem is compounded by the code that is trying to check if the clients
are alive. It does this by sending a dummy request to the client after every
timeout. If the client is around, it will just ignore it. If its dead,
there will be an error. But after the first timeout, this becomes a constant
bombardment on the clients since it is in a busy loop with a zero timeout!
What I think my problem is is that the client gets bogged down by the
constant stream of dummy keep-alive type requests this causes and thats why
it takes so long to respond.
So at least this one place in the code should be fixed. I sadly don't have a
build environment to test this but I hope someone else can soon since I think
its a pretty major bug! Also a general review of the use of select in
Firebird should be done in case there are other places with the same problem
on Linux.
--
Brad Pepers
brad@...