firebird-support - Re: [firebird-support] OIT and OAT not moving

Subject	Re: [firebird-support] OIT and OAT not moving
Author	Helen Borrie
Post date	2004-12-17T07:57:35Z

At 11:45 AM 17/12/2004 +0800, you wrote:

>Gee Helen, thanks for making the effor for asking Roman yourself.

I didn't. I only pasted Roman's firebird-java response back here, since
the thread had come back here.

> I did post in the list as you suggested, but I didn't see it get through
>though.

It did.

>On Thu, 16 Dec 2004 14:40:45 -0500, Rick DeBay <rdebay@...>
>wrote:
> > If this is a Java application, commit() doesn't need to be called on
> > SELECT.

Why not? What is different about Java, that you suppose it doesn't need to
commit a transaction.

> > In fact, the default is auto-commit, unless the program turns it off for
> > the connection.

Autocommit causes a DML statement to be committed automatically when the
application submits an insert, update or delete statement. SELECT
statements also run in transactions. Nothing will cause that transaction
to "autocommit" if there are no state-changing statements in the same
transaction.

> Auto-commit would be turned back on when the connection
> > is returned to the connection pool.

I don't know how. The Jaybird driver's connection class is restricted to a
single transaction. It's more probable that the transaction gets created
according to defaults in the connection class; so if you leave a
connection instantiated in the pool, it will keep its original property
settings unless you explicitly change it.

> > If you aren't using a connection pool, one problem would be generating
> > new connections and then never closing them.

Under the Jaybird model, that appears to be possible. If that's a problem
in Jaybird, it must have something to do with the enforced existential
binding between the transaction and the connection. It's not true in terms
of the capabilities of the database server though. In well-behaved apps
with API interface layers that implement the connection and the transaction
as distinct classes, dead connections don't cause persistent problems with
the transaction inventory.

Jaybird seems to have its cleanup stuff pretty well covered. The fact is,
you're seeing behaviour - stuck and/or long-running transactions, crashes
during two-phase commits -- that you shouldn't see in an application that
takes care of things properly. Therefore, the problem lies somewhere else
than in the Jaybird layer. Suspect that the salmon framework and/or the
user application code is doing something untoward.

>Our application uses salmonllc's Sophia framework. Yes we use
>connection pooling, and yes auto-commit is on. I've gone through some
>of the application code and its quite messy. Some don't have commits,
>some don't release/free connections, and all this is staggered around
>5000 files of java code. In 'GNU top', some fb_inet_server processes
>could take up to 15 hours. I'm not sure whether its waiting for a
>connection, or its waiting for something else, or simply whether its a
>really huge query.

ps is more useful than top. Use ps | grep fb and you should be able to see
all of the current connections, rather than just the odd one that happens
to be eating a lot of cpu time, which is what top shows you. Top also eats
resources itself.

An fb_inet_server process is just a connection. It might be doing nothing
at all, but it won't go away until the client disconnects. The client
won't be able to disconnect if a user transaction is still running. I
don't think you have yet eliminated the possibility that it is a read-write
transaction running a SELECT statement somewhere that is holding up the
OAT. This might well be down to the common misconception that a SELECT
statement is somehow isolated from transaction control It very much
isn't. If you open a connection, i.e. with classic, start a server
process, to submit a SELECT statement and return a record set to the client
-- your typical desktop-style browsing interface -- then you have a running
transaction that will stay active until it's committed or rolled back
(preferably committed, since rollback doesn't immediately make the
transaction uninteresting).

>I'm not really the developer, and I haven't really
>read all those 5000 files and can't really tell how often these cases
>occur. Sooner or later the code has to be fixed, but that's going to
>take a while (technically as well as politically). Sorry for straying
>away from the topic.

It's not necessarily all in your application. If someone is watching from
a Windows workstation with a GUI tool, or is continually running a
reality-check query with isql open all day, and never committing, then
that's a long-running transaction.

At one point, you mentioned replication. Are you sure your replication
system isn't the culprit?

You're churning a group of pooled connections to classic server. Classic
does cooperative garbage collection, not a background GC process like
Superserver. Maybe your 15-hour process is one of those processes taking
the "GC hit". Maybe your application architecture just isn't suitable for
classic. Did you ever try it with Superserver?

Or it could be the automatic sweep kicking in and trying to find things to
clean up in the gap...

>Anyway, at first I thought that the huge 'gap' previously was the
>cause for performance issues. But then we fixed that, and we still
>have about 700M of free RAM, and we never swap. But performance is
>still an issue, so it probably wasn't because of the 'gap' after all.
>Really running out of ideas now. Being able to have a few workarounds
>while someone fixes the code is good enough for now.

It seems like a lot of stabbing in the dark. It's not even clear any more
what you mean by "performance issues", since the parameters of this setup
seem to be augmented every time you post.

>Appreciate the help everyone, and I hope more suggestions will come. :)

It could be worth trying with superserver instead. It could be worth
setting off autosweeping (set the sweep interval to 0) and replacing it
with sweeps scheduled to run at quiet times. It might gain you something
while you are waiting for someone to review the application software.

It would help a lot to describe what you are perceiving as "performance
issues" - identify the task that seems to cause the problems, describe what
it's doing and the numbers of rows and transactions involved.

./hb