Subject Re: [firebird-support] XNET error: get_free_slot() failed ? (2.1.1)
Author Olivier Mascia
Le 03-sept.-08 à 13:13, Vlad Khorsun a écrit :

> Looks like http://tracker.firebirdsql.org/browse/CORE-1902, but
> that
> case was for Super Classic...

It looks different because here the xnet works very well and for
numerous simultaneous connections, but seems to fail sometimes after a
large number of connect / disconnects over multiple hours or days.

> How many your clients may connect at the same second
> simultaneously ?
> Does a little pause between connection attempts (if its possible)
> helps ?

On the server where I detected this, the usage is low. Around 40 to
50 attachments to the same DB. At times it may be possible that 4 or
5 new attachments are attempted roughly "at the same time", but not
more and certainly not as a routine. Though these attachments may be
disconnect and new attachments connected later a quite high number of
times per day. Hard to count them without instrumenting some of our
code, but I'd empirically would say we can hit more than 1 thousand
new attachments per 24h, at least hundreds. (Of course the software
does not spend most of its time connecting / disconnecting - but it
can do so more often than some other architectures do).

Some attachements come in through tcp/ip (remote lan processes), and
these are long-lasting attachments, but most of them are through local
connect (from an application server), and last some minutes at most,
most of the time.

A pause won't change anything: when the error triggers, that's game
over until restart: no new connection can be made (though honestly I
don't know if tcp connections are still possible -- I'll test -- xnet
connections at least are not possible anymore).

> Of course, reproducible test case will help to resolve issue.

I'd love to isolate a reproducible test case, though for now, I have
not the smallest clue as to what exact circumstances or number of
repetition of specific sequence of actions can lead to this. It just
happened after 2 to 3 days of continuous operation with certainly
thousands of connect / disconnect mostly to the same DB. For one week
I restarted the service at night, and the problem didn't triggered. I
left it run free for the last 3 days, and bingo.

I'd think of a resource leak on xnet attachment disconnect. I'll try
diffing the related sources between 2.0 and 2.1.1, that could
highlight something.

--
Olivier Mascia
T.I.P. Group S.A.
http://www.tipgroup.com