Subject Transaction Management Failing?
Author Robert Harris

Hi,

  Our app uses Delphi, Win32. Currently our customers only use release versions, most of which are 2.5.4.2. One uses version 3.x.  All the installations of firebird server use Superserver and may be installed on a range of O/S, all windows, either 32 or 64bit. We only use Firebird 32bit.

 

This problem has been occurring sporadically for years. All the way back to ver 1.5. It may occur 1 time a year or less, or several times a week over a month or 2. Then disappear. I have never been able to pin it down, no matter what test programs I wrote. I have posted it a couple of times on various knowledge sites, but no clues have come back.

 

Here is the issue, consider the following logic:

 

done  =  false;

While not done

   Try

 

StartTransaction;

InsertMyData;

CommitTransaction;

Done = true;

 

   Except

Begin

  RollbackTransaction.

End;

 

 

The logic is that straightforward. The InsertMyData inserts lines of an invoice or any other customer transaction into ledgers, each identified by a transaction number. The problem is that when the issue occurs, multiple copies of the data end up in the database, which based on the above logic should not be possible.

 

One customer who continuously suffers from this issue every month has a single invoice inserted into the database 500+ times a few weeks back, this is an exception as it normally only inserts a single duplicate. There was apparently no delay on the computer on which the transaction was generated (which was not the firebird server), which suggests that the sql was not transmitted multiple times and maybe server corruption has occured. Writing 500 of our invoices would generate a massive amount of sql statements and would take a number of minutes to send and process. The customer is quite sure that there was no delay. This customer I tried upgrading to fb3.x, still no improvement.

 

Some customers have it more than others. 1 such customer had it occurring every other day for six months, then it stopped, now it has re-started. Some customers have not experienced this for many years. When the problem is ‘active’ it may occur just a single time out of thousands of transactions. Customers are all on the same version of our software. Gfix/gbak etc.. make no difference.

 

From the exhaustive research into this problem I sort of narrowed it down to 2 potential causes, but they are impossible to prove.

 

  1. Server overload.
    1. Client Writes Data. Server is busy.
    2. Server process request but is slow to ack the transaction.
    3. Client does not hear back from the server in timeout period so issues a rollback and retries.

 

  1. Server corruption???

 

Just wondering if anyone has experienced this and can shed any light on this? It’s a really difficult one as it is totally random.

 

Thanks,

Robert Harris.