Subject | Best Practice Question - How many rows to update before commit? |
---|---|
Author | Myles Wakeham |
Post date | 2007-11-09T16:07:58Z |
I have a large data load application that moves about 500,000 rows of data
from an ODBC source to Firebird 1.5, which is stored in approx. 10 different
tables. My load routine obtains the source data and then loops through it,
using an ODBC connection to FB to INSERT data into the target tables.
It has been working very well for some time, however on implementing it on a
client's server to run as a scheduled batch process it is showing signs of
failure.
What I'm seeing is that after getting through about 75% of the data load, it
appears to go into some continual loop in which it never recovers. I see
activity in the FB server computer's hard drive, and continuous activity on
the data loading client, but it never gets through the loop. Its very
strange because I would have expected to see errors coming from FB that
would interrupt the load, but I don't see this. Its almost like it gets
stuck in a groove on one random record and never recovers from it. Yet I do
not see repetitive rows being posted to FB.
In attempting to debug this behavior, I checked my code and noticed that I
was going through the data load for the different data source tables, and
incrementing a counter for each row posted. After I hit 5,000 rows, I would
force a commit to FB.
What is the best number of rows to post in a transaction before a commit
should be done? I'm sure that I probably need to commit more frequently but
I'm trying to balance performance vs. reliability here. The data load takes
about 3 hours to finish, so frequent commits could have dramatic affect on
the length of time that this has to run, and as it is run nightly in most
cases, I need to find a way to do this as quickly and reliably as possible.
Any thoughts are greatly appreciated.
Regards,
Myles
============================
Myles Wakeham
Director of Engineering
Tech Solutions USA, Inc.
Scottsdale, Arizona USA
www.techsol.org
Phone +1-480-451-7440
Try our new Outlook Utility 'Split Personality'
http://splitpersonality.techsol.org
from an ODBC source to Firebird 1.5, which is stored in approx. 10 different
tables. My load routine obtains the source data and then loops through it,
using an ODBC connection to FB to INSERT data into the target tables.
It has been working very well for some time, however on implementing it on a
client's server to run as a scheduled batch process it is showing signs of
failure.
What I'm seeing is that after getting through about 75% of the data load, it
appears to go into some continual loop in which it never recovers. I see
activity in the FB server computer's hard drive, and continuous activity on
the data loading client, but it never gets through the loop. Its very
strange because I would have expected to see errors coming from FB that
would interrupt the load, but I don't see this. Its almost like it gets
stuck in a groove on one random record and never recovers from it. Yet I do
not see repetitive rows being posted to FB.
In attempting to debug this behavior, I checked my code and noticed that I
was going through the data load for the different data source tables, and
incrementing a counter for each row posted. After I hit 5,000 rows, I would
force a commit to FB.
What is the best number of rows to post in a transaction before a commit
should be done? I'm sure that I probably need to commit more frequently but
I'm trying to balance performance vs. reliability here. The data load takes
about 3 hours to finish, so frequent commits could have dramatic affect on
the length of time that this has to run, and as it is run nightly in most
cases, I need to find a way to do this as quickly and reliably as possible.
Any thoughts are greatly appreciated.
Regards,
Myles
============================
Myles Wakeham
Director of Engineering
Tech Solutions USA, Inc.
Scottsdale, Arizona USA
www.techsol.org
Phone +1-480-451-7440
Try our new Outlook Utility 'Split Personality'
http://splitpersonality.techsol.org