Subject | Re: [firebird-support] XNET error: get_free_slot() failed ? (2.1.1) |
---|---|
Author | Olivier Mascia |
Post date | 2008-09-03T12:34:07Z |
Le 03-sept.-08 à 13:13, Vlad Khorsun a écrit :
numerous simultaneous connections, but seems to fail sometimes after a
large number of connect / disconnects over multiple hours or days.
50 attachments to the same DB. At times it may be possible that 4 or
5 new attachments are attempted roughly "at the same time", but not
more and certainly not as a routine. Though these attachments may be
disconnect and new attachments connected later a quite high number of
times per day. Hard to count them without instrumenting some of our
code, but I'd empirically would say we can hit more than 1 thousand
new attachments per 24h, at least hundreds. (Of course the software
does not spend most of its time connecting / disconnecting - but it
can do so more often than some other architectures do).
Some attachements come in through tcp/ip (remote lan processes), and
these are long-lasting attachments, but most of them are through local
connect (from an application server), and last some minutes at most,
most of the time.
A pause won't change anything: when the error triggers, that's game
over until restart: no new connection can be made (though honestly I
don't know if tcp connections are still possible -- I'll test -- xnet
connections at least are not possible anymore).
not the smallest clue as to what exact circumstances or number of
repetition of specific sequence of actions can lead to this. It just
happened after 2 to 3 days of continuous operation with certainly
thousands of connect / disconnect mostly to the same DB. For one week
I restarted the service at night, and the problem didn't triggered. I
left it run free for the last 3 days, and bingo.
I'd think of a resource leak on xnet attachment disconnect. I'll try
diffing the related sources between 2.0 and 2.1.1, that could
highlight something.
--
Olivier Mascia
T.I.P. Group S.A.
http://www.tipgroup.com
> Looks like http://tracker.firebirdsql.org/browse/CORE-1902, butIt looks different because here the xnet works very well and for
> that
> case was for Super Classic...
numerous simultaneous connections, but seems to fail sometimes after a
large number of connect / disconnects over multiple hours or days.
> How many your clients may connect at the same secondOn the server where I detected this, the usage is low. Around 40 to
> simultaneously ?
> Does a little pause between connection attempts (if its possible)
> helps ?
50 attachments to the same DB. At times it may be possible that 4 or
5 new attachments are attempted roughly "at the same time", but not
more and certainly not as a routine. Though these attachments may be
disconnect and new attachments connected later a quite high number of
times per day. Hard to count them without instrumenting some of our
code, but I'd empirically would say we can hit more than 1 thousand
new attachments per 24h, at least hundreds. (Of course the software
does not spend most of its time connecting / disconnecting - but it
can do so more often than some other architectures do).
Some attachements come in through tcp/ip (remote lan processes), and
these are long-lasting attachments, but most of them are through local
connect (from an application server), and last some minutes at most,
most of the time.
A pause won't change anything: when the error triggers, that's game
over until restart: no new connection can be made (though honestly I
don't know if tcp connections are still possible -- I'll test -- xnet
connections at least are not possible anymore).
> Of course, reproducible test case will help to resolve issue.I'd love to isolate a reproducible test case, though for now, I have
not the smallest clue as to what exact circumstances or number of
repetition of specific sequence of actions can lead to this. It just
happened after 2 to 3 days of continuous operation with certainly
thousands of connect / disconnect mostly to the same DB. For one week
I restarted the service at night, and the problem didn't triggered. I
left it run free for the last 3 days, and bingo.
I'd think of a resource leak on xnet attachment disconnect. I'll try
diffing the related sources between 2.0 and 2.1.1, that could
highlight something.
--
Olivier Mascia
T.I.P. Group S.A.
http://www.tipgroup.com