firebird-java - Re: [Firebird-Java] reuse of connection after failure.

Subject	Re: [Firebird-Java] reuse of connection after failure.
Author	David Jencks
Post date	2007-03-13T01:06:45Z

I'm glad to see people are actually using firebird xa.

IIUC part of the problem is that managed connections that are
definitely dead are not getting removed from the pool.

I can think of 2 possible causes for this:

1. jaybird is not emitting a ConnectionErrorEvent to registered
ConnectionEventListeners when this error occurs. I'd expect that
jaybird should emit such an error whenever you try to use a dead
connection.
1.a. jaybird is emitting such an event, but it doesn't have the
managed connection set on it, so jboss can't figure out what to do.

2. jboss is not removing the MC from its pool and destroying the MC
when it receives the error event.

Figuring out which of these is happening should be pretty easy with a
debugger or some logging inserted in the code.

thanks
david jencks

On Mar 12, 2007, at 3:49 PM, Roman Rokytskyy wrote:

> Andrew Goedhart wrote:
> > The real problem is that connection keeps on being returned on
> request for a connection from the Datasource at start of the
> subsequent jobs. It is no longer usable. When the job tries to use
> the connection it dies horribly. This keeps on happening. The only
> way to recover is to kill the server and reload. We have +- 30
> other connections attached to the database form the current and
> other servers in the cluster. These seem to be okay.
>
> Do you mean that other servers do not notice SEGV and subsequent FB
> restart? Do you use ClassicServer?
>
> > This also happens on a newly restored database. The database is
> rather large: 180+GB and growing rapadly. last count the readings
> table had about 1+ billion records, so I guess we are working
> Firebird hard. I actually don't mind loosing one or two readings (I
> never said that :-) from the incoming streams if I can keep the
> servers up. By the way what is the maximum number of records per
> table for firebird 2 servers. Firebird 1.5 had a limitation of 4
> billion records per table. Has this changed or do I need to start
> looking for a solution. Only half of my units are reporting into
> the new system currently.
>
> In FB 2.0 the RDB$DB_KEY length was increased to 40 effective bits, so
> it very likely solves your issue.
>
> > Since the start of the discussion we have changed the JMS server
> to kill the worker thread and creating a new one. This seems to
> help. It seems like JBoss is tracking the transactions per thread.
>
> This is common requirement to XA and/or EJB systems - there can be
> only
> one transaction associated with the thread at any particular time.
> So I
> can easily imagine that internally there is some map with Thread
> instance as a key and Xid (or similar) as a value. BTW, do you see
> memory growth here?
>
> > Previously had huge problems with how Jboss handles failure on
> the JMS cluster and between ActiveMQ and Firebird. This is the
> reason for the XA transactions. This means that we load our custom
> JMS bean server as a SAR in JBoss and do manual XA Transaction
> control. We are most likely initiating the roll back when we
> realise that something has gone wrong with the current job.
>
> Ok.
>
> > Local transactions seem to be a problem. I don't know why, but if
> I set the datasource to use only local transactions, after a while
> Firebird starts deadlocking and not recovering properly. After
> about 1 hour it snowballs and everything grinds to a halt. This is
> generally around the status table whose records are constantly
> being updated. I don't know why XA transactions avoid this but they
> seem to. With XA transactions, provided that we don't get the link
> problem above, I can run the cluster for days before something else
> kills us. The status and vehicle tables are also one of the few
> tables where we need to use explicit locking to serialize access to
> records and avoid continual roll backs due simultaneous updates.
>
> This is very strange. How do you lock your records? Do you use custom
> TPB? What operations are executed within single transaction?
>
> > We currently run gfix as a cron job in the background. Limbo
> transactions tend to cause page load failures in the web interface.
> So we tend to try catch them as soon as possible. (the gfix is run
> every minute or so to clear limbo transactions. This currently does
> not seem to have a negative effect, but welcome comment on the
> practice)
>
> Hopefully Helen can help here :)
>
> > If I understand you correctly, you are saying that the connection
> pooling in this case is handled at the Jboss level and not at the
> firebird level ? This means that I have just been lucky for the
> last few hours:-) and that my changes to the defaults have no effect.
>
> Correct. The pooling defaults are used only within the
> org.firebirdsql.pool.* classes, but your JCA configuration is based on
> org.firebirdsql.jca.* classes.
>
> > Maybe the thread suicide is doing the job when it discovers an
> unrecoverable error. ?
> >
> > Any idea how you can force JBoss to kill a connection and not
> attempt to reuse it or recover. It seems that the recovery is
> failing in Firebird.
>
> I doubt that one can prevent recovery - that is standard part of XA
> protocol. I will check JBoss docs to see if there's something we
> can do.
>
> In the meantime, can you check your logs to see if any other exception
> are there. I am interesting in the most complete picture.
>
> > By the way how does one turn on tracing with Jaybird. Using the
> command line system property on Jboss and including org.firebirdsql
> = DEBUG in the log4J.xml file does not seem to be enough.
>
> Strange. The code is the following (note, the def variable is
> always false):
>
> String sLog4j = System.getProperty("FBLog4j");
> if (!def){
> if (sLog4j != null && sLog4j.equals("true"))
> log4j = true;
> else
> log4j = false;
> }else{
> if (sLog4j != null && sLog4j.equals("false"))
> log4j = false;
> else
> log4j = true;
> }
>
> if (log4j){
> try {
> Class verify = Class.forName("org.apache.log4j.Category");
> log4j = true;
> }
> catch (ClassNotFoundException cnfe){
> log4j = false;
> }
>
> }
>
> This code basically checks the system property and then checks whether
> the org.apache.log4j.Category class is in the classpath. Maybe this
> gives you an idea what is missing?
>
> Roman
>
>

[Non-text portions of this message have been removed]