firebird-java - tomcat dbcp jaybird: connection jumping thread barrier

Subject	tomcat dbcp jaybird: connection jumping thread barrier
Author	unordained
Post date	2011-06-07T21:25:07Z

Environment:
jaybird 2.1.6
firebird 2.1.4
tomcat 6.0.30 w/ dbcp

What I'm seeing:
In a high-load situation, two Tomcat worker threads that each started their own
Firebird connections will run into each other. They shouldn't.

How I found it:
I have code that uses rdb$set_context('USER_TRANSACTION', ...) to set a context
variable identifying the end-user (since the connection pool all uses one
generic username) for audit purposes. I call a SP that will raise an exception
if called multiple times, in an apparent attempt to change the active username
on a transaction. That started happening, so I first thought that context
variables might be leaking across contexts. But I added some debug code to see
what's going on, and found this instead:

DEBUG 14:13:23.239 r_339 db.mgt : use# 80 on conn# 32069 known
as trans# 160645
DEBUG 14:13:23.271 r_340 db.mgt : use# 3 on conn# 32066 known
as trans# 160647
DEBUG 14:13:23.380 r_340 base.ConnectionSingleton : uuu7: use# 80 on conn# 32069
already identified as user# XXXXXXXX known as trans# 160645

r_339 and r_340 are two threads working at nearly the same time. Each time I get
a Firebird connection from the pool, I output something like this:

l.debug("use# " + qry.singletonQuery(conn, "select
rdb$get_context('USER_SESSION', 'session_uses') from rdb$database", null) + " on " +
"conn# " + qry.singletonQuery(conn, "select current_connection from
rdb$database", null) + " known as " +
"trans# " + qry.singletonQuery(conn, "select current_transaction from
rdb$database", null));

I've checked that in-between these logs, no new connections were started, there
were no commits or rollbacks -- two threads started connections, one identified,
then the other, and the second one messed up and called the SP against the
connection started by the first. This happens very rarely (I see this error a
few times per day in a production environment, but it's a tiny percentage of the
time) and it's rather hard to reproduce.

What I'm doing:
For a variety of reasons, I'm storing my java.sql.Connection references in a
ConnectionSingleton class, of which there are several per ConnectionCollection,
which is ThreadLocal. Mostly I was tired of passing Connection references around
everywhere, for several different databases (Firebird, SQLServer, etc.) to be
used simultaneously. So at the beginning of its work, each thread sets up a
ConnectionCollection for its Thread, initializing only the ConnectionSingleton
objects that it'll need (maybe 2 out of 7 available datasources), and then later
code relies on that. Lots of catch/finally make sure that connections get rolled
back, and the whole ConnectionCollection gets removed from the ThreadLocal map
at the end of the request processing. (Not that it matters in this case -- two
threads just started working on two requests at nearly the same time, and before
either of them could do useful work, their beams got crossed.)

I'm pretty sure the problem isn't the ThreadLocal, because I added debugging
that tracks the thread ID that initialized the singleton, and I compare that to
the current thread ID when the code that blows up is run -- I'd get a different
exception if a thread were grabbing a different thread's ConnectionCollection
and uses the wrong Connection objects. So it looks like it's using the same
ConnectionSingleton it initialized, with the same java.sql.Connection it
originally got, but somehow ... it winds up talking to a different firebird
connection?

I have no idea how that's possible, but I'm quickly running out of things to
test. Please, please have some ideas for me!

-Philip