firebird-java - Re: [Firebird-Java] reuse of connection after failure.

Subject	Re: [Firebird-Java] reuse of connection after failure.
Author	Roman Rokytskyy
Post date	2007-03-12T19:49:49Z

Andrew Goedhart wrote:

> The real problem is that connection keeps on being returned on request for a connection from the Datasource at start of the subsequent jobs. It is no longer usable. When the job tries to use the connection it dies horribly. This keeps on happening. The only way to recover is to kill the server and reload. We have +- 30 other connections attached to the database form the current and other servers in the cluster. These seem to be okay.

Do you mean that other servers do not notice SEGV and subsequent FB
restart? Do you use ClassicServer?

> This also happens on a newly restored database. The database is rather large: 180+GB and growing rapadly. last count the readings table had about 1+ billion records, so I guess we are working Firebird hard. I actually don't mind loosing one or two readings (I never said that :-) from the incoming streams if I can keep the servers up. By the way what is the maximum number of records per table for firebird 2 servers. Firebird 1.5 had a limitation of 4 billion records per table. Has this changed or do I need to start looking for a solution. Only half of my units are reporting into the new system currently.

In FB 2.0 the RDB$DB_KEY length was increased to 40 effective bits, so
it very likely solves your issue.

> Since the start of the discussion we have changed the JMS server to kill the worker thread and creating a new one. This seems to help. It seems like JBoss is tracking the transactions per thread.

This is common requirement to XA and/or EJB systems - there can be only
one transaction associated with the thread at any particular time. So I
can easily imagine that internally there is some map with Thread
instance as a key and Xid (or similar) as a value. BTW, do you see
memory growth here?

> Previously had huge problems with how Jboss handles failure on the JMS cluster and between ActiveMQ and Firebird. This is the reason for the XA transactions. This means that we load our custom JMS bean server as a SAR in JBoss and do manual XA Transaction control. We are most likely initiating the roll back when we realise that something has gone wrong with the current job.

Ok.

> Local transactions seem to be a problem. I don't know why, but if I set the datasource to use only local transactions, after a while Firebird starts deadlocking and not recovering properly. After about 1 hour it snowballs and everything grinds to a halt. This is generally around the status table whose records are constantly being updated. I don't know why XA transactions avoid this but they seem to. With XA transactions, provided that we don't get the link problem above, I can run the cluster for days before something else kills us. The status and vehicle tables are also one of the few tables where we need to use explicit locking to serialize access to records and avoid continual roll backs due simultaneous updates.

This is very strange. How do you lock your records? Do you use custom
TPB? What operations are executed within single transaction?

> We currently run gfix as a cron job in the background. Limbo transactions tend to cause page load failures in the web interface. So we tend to try catch them as soon as possible. (the gfix is run every minute or so to clear limbo transactions. This currently does not seem to have a negative effect, but welcome comment on the practice)

Hopefully Helen can help here :)

> If I understand you correctly, you are saying that the connection pooling in this case is handled at the Jboss level and not at the firebird level ? This means that I have just been lucky for the last few hours:-) and that my changes to the defaults have no effect.

Correct. The pooling defaults are used only within the
org.firebirdsql.pool.* classes, but your JCA configuration is based on
org.firebirdsql.jca.* classes.

> Maybe the thread suicide is doing the job when it discovers an unrecoverable error. ?
>
> Any idea how you can force JBoss to kill a connection and not attempt to reuse it or recover. It seems that the recovery is failing in Firebird.

I doubt that one can prevent recovery - that is standard part of XA
protocol. I will check JBoss docs to see if there's something we can do.

In the meantime, can you check your logs to see if any other exception
are there. I am interesting in the most complete picture.

> By the way how does one turn on tracing with Jaybird. Using the command line system property on Jboss and including org.firebirdsql = DEBUG in the log4J.xml file does not seem to be enough.

Strange. The code is the following (note, the def variable is always false):

String sLog4j = System.getProperty("FBLog4j");
if (!def){
if (sLog4j != null && sLog4j.equals("true"))
log4j = true;
else
log4j = false;
}else{
if (sLog4j != null && sLog4j.equals("false"))
log4j = false;
else
log4j = true;
}

if (log4j){
try {
Class verify = Class.forName("org.apache.log4j.Category");
log4j = true;
}
catch (ClassNotFoundException cnfe){
log4j = false;
}

}

This code basically checks the system property and then checks whether
the org.apache.log4j.Category class is in the classpath. Maybe this
gives you an idea what is missing?

Roman