Subject Re: [firebird-support] VB.Net select query hangs
Author Helen Borrie
At 03:42 AM 28/02/2009, you wrote:
>Hello everyone!
>I am a VB Developer, and my office currently uses Firebird 2.0 DB
>system. I have a recurring problem that I cannot seem to pinpoint.
>My current setup is the following:
>VB.Net 2005
>Firebird .Net provider 2.1.0
>Firebird 2.0 DB
>IBExpert 2009 version
>Legacy software using Firebird ODBC Data Provider 1.2.0.69

1. Firebird 2.0.what?
2. Classic or Superserver
3. If Classic, are you running the Guardian?


>Heres the problem I experience. I have written several "patchwork"
>type windows applications that interface with the Firebird DB for my
>shop. Our shop also has legacy software that is constantly running
>in the background (24/7/365).

Does the legacy software connect to the Firebird database? Assuming yes...
Does this imply that there are periods when the users of the .NET apps are not likely to be connecting/connected?

>Intermittently, my software
>will "hang" on a relatively simple SQL query. My code successfully
>connects to the DB with connection string, but "hangs" when it
>reaches the first query within the code. This "hang" does not
>trigger any errors within the code, and it does not return any
>values, it will simply just sit there forever if I allowed it (I let
>it run once for 8 hours before killing the program). This seems to
>occur once per week up to once every other week. This problem
>requires a reboot to the DB Server in order to clear up.

It has the symptoms of a garbage collection problem. A full shutdown of the server would deal with long-running uncommitted transactions in that legacy software, which is your most likely offender, if I rightly understand that your "patchwork" apps are hit-and-run.

>I can physically place the query into SQL Editor in IBExpert, and the query
>will process just fine.

That would not be surprising, if the holdup for the application's transaction is that it lucked out and got landed with a heap of cooperative garbage collection, as well it might under Fb 2 SS if you left the default GCPolicy.

>Likewise, the legacy software seems to be unaffected.

Nevertheless, the legacy software may well be the cause of the problem, if it performs large chunks of updates and deletes in an ongoing CommitRetaining scenario and never hard commits anything. If it's rarely or never doing SELECT queries, it will never be in the position of cleaning up its own garbage. It will just go on doing what it does and snagging up everything else.

You probably need to pay quite a lot of attention to that legacy app. It has the look of neglect about it, considering you're using a very ancient version of the ODBC driver with a relatively recent server version; and that you rarely have it offline. Drivers get updated for two main reasons: first, to support changes in the database engine and secondly, to fix bugs. Five or six years is a long time to ignore those.

>I have tried to alter DB settings,

This doesn't tell us anything. What "DB settings"? From what to what? And if you've modified firebird.conf, do you stop and restart the server afterwards (SS) or have all users offline (Classic)?

>and I have been monitoring users/transactions within the DB, and there doesn't appear to be much of any rhyme or reason for the disruption, although,
>typically, there is about a 20k - 50k transaction gap between OAT and
>next transaction. The DB processes apx. 200k transactions per day.

This would not be a showstopping "gap" provided it was moving forward steadily you were seeing the gap between OIT (oldest transaction) and Oldest Snapshot also moving and steady. At the same time, if your "20k - 50k" describes a gap that's getting progressively bigger as you go from one crisis to another then it's telling you you've got a problem.

>
>I have tried to find info on how to release a limbo transaction, but
>there doesn't seem to be a manual way to do this

If you're only connecting to one database then you won't have limbo transactions. They only occur when a transaction across multiple databases gets interrupted by losing one of the connections.

>, and a sweep of the system does not change anything.

A sweep will release garbage that's eligible to be collected. Here it looks as if the problem is progressively ascending level of "stuck" garbage that remains ineligible for collection because that 24/7 app isn't doing the right thing with its transactions. You won't get a permanent solution until that is fixed. It's analogous to hosing flyspray all around the interior of the house with all the doors and windows open.

(Of course, I don't suggest for one moment that you ignore the possibility that your .NET Provider apps are not offending in a similar way. There is an active forum for the .NET Provider, which you should use to raise any doubts you have about whether those apps are doing the right thing.)

For a temporary solution you need to schedule a full downtime as often as is needed - preferably BEFORE the drains get blocked - to run a full gbak backup. You should be doing backups anyway....but apparently not often enough to keep ahead of your garbage problem. By your account, once a week looks like a place to start.

Why do I suggest a full downtime? A "hot" backup while that naughty transaction keeps on CommitRetaining its work won't let gbak clean out the garbage that it's holding on to. Shutting down the offending app's connection will at least end the currently offending transaction - I suspect it might be the only way to hard commit that app's work, ever.

If you're running SS, you might also look at changing the GCPolicy to 'background' for a while, to see whether it's cooperative GC hits that are hanging up your .NET apps' selects, as I suspect....It should help to eliminate those apparent "hangs" while the unsuspecting transaction gets hit with the coop GC, but it won't cure the underlying problem of transactions that never end.

GCPolicy only affects SS. If you're running Classic, there is no option for 'background' GC, so every connection that does a SELECT is potentially going to get landed with cleaning up after the 24/7 app (or whatever apps are failing to hard commit their work).

./heLen