Subject Re: [Firebird-Java] tomcat dbcp jaybird: connection jumping thread barrier
Author Mark Rotteveel
On 7-6-2011 23:25, unordained wrote:
> Environment:
> jaybird 2.1.6
> firebird 2.1.4
> tomcat 6.0.30 w/ dbcp
>
> What I'm seeing:
> In a high-load situation, two Tomcat worker threads that each started their own
> Firebird connections will run into each other. They shouldn't.
>
> How I found it:
> I have code that uses rdb$set_context('USER_TRANSACTION', ...) to set a context
> variable identifying the end-user (since the connection pool all uses one
> generic username) for audit purposes. I call a SP that will raise an exception
> if called multiple times, in an apparent attempt to change the active username
> on a transaction. That started happening, so I first thought that context
> variables might be leaking across contexts. But I added some debug code to see
> what's going on, and found this instead:
>
> DEBUG 14:13:23.239 r_339 db.mgt : use# 80 on conn# 32069 known
> as trans# 160645
> DEBUG 14:13:23.271 r_340 db.mgt : use# 3 on conn# 32066 known
> as trans# 160647
> DEBUG 14:13:23.380 r_340 base.ConnectionSingleton : uuu7: use# 80 on conn# 32069
> already identified as user# XXXXXXXX known as trans# 160645
>
> r_339 and r_340 are two threads working at nearly the same time. Each time I get
> a Firebird connection from the pool, I output something like this:
>
> l.debug("use# " + qry.singletonQuery(conn, "select
> rdb$get_context('USER_SESSION', 'session_uses') from rdb$database", null) + " on " +
> "conn# " + qry.singletonQuery(conn, "select current_connection from
> rdb$database", null) + " known as " +
> "trans# " + qry.singletonQuery(conn, "select current_transaction from
> rdb$database", null));
>
> I've checked that in-between these logs, no new connections were started, there
> were no commits or rollbacks -- two threads started connections, one identified,
> then the other, and the second one messed up and called the SP against the
> connection started by the first. This happens very rarely (I see this error a
> few times per day in a production environment, but it's a tiny percentage of the
> time) and it's rather hard to reproduce.
>
> What I'm doing:
> For a variety of reasons, I'm storing my java.sql.Connection references in a
> ConnectionSingleton class, of which there are several per ConnectionCollection,
> which is ThreadLocal. Mostly I was tired of passing Connection references around
> everywhere, for several different databases (Firebird, SQLServer, etc.) to be
> used simultaneously. So at the beginning of its work, each thread sets up a
> ConnectionCollection for its Thread, initializing only the ConnectionSingleton
> objects that it'll need (maybe 2 out of 7 available datasources), and then later
> code relies on that. Lots of catch/finally make sure that connections get rolled
> back, and the whole ConnectionCollection gets removed from the ThreadLocal map
> at the end of the request processing. (Not that it matters in this case -- two
> threads just started working on two requests at nearly the same time, and before
> either of them could do useful work, their beams got crossed.)
>
> I'm pretty sure the problem isn't the ThreadLocal, because I added debugging
> that tracks the thread ID that initialized the singleton, and I compare that to
> the current thread ID when the code that blows up is run -- I'd get a different
> exception if a thread were grabbing a different thread's ConnectionCollection
> and uses the wrong Connection objects. So it looks like it's using the same
> ConnectionSingleton it initialized, with the same java.sql.Connection it
> originally got, but somehow ... it winds up talking to a different firebird
> connection?
>
> I have no idea how that's possible, but I'm quickly running out of things to
> test. Please, please have some ideas for me!

I do think that you need to look at your own code first. The problem to
me could indicate unintentional sharing of objects between Threads. Make
sure you are actually using the ThreadLocal everywhere, not caching it
in between.

Next would be to check if the identity (eg hashcode) of the proxy
connection object (returned by DBCP) is the same or different in each
debug print.
If those are different (for the same thread), it could be an indication
that you *are* sharing the connection between threads (if it does have
the same identity as the one from the other thread), or that you closed
the connection and obtained a new one from DBCP (which has the same
inner connection as the other thread had previously)

If the identities are the same, I would check if the FBConnection
wrapped by the proxy object always has the right identity (eg hashcode).
To be able to get the wrapped object, you may need to do additional
configuration of DBCP.

If all those identities are the same when the Thread seems to be jumping
between connections, that could indicate the Jaybird is somehow messing
up the attachment handle, but I think that is not very likely.

Mark
--
Mark Rotteveel