Subject Re: Best Practice Question - How many rows to update before commit?
Author Adam
--- In, "Myles Wakeham" <myles@...>
> I have a large data load application that moves about 500,000 rows
of data
> from an ODBC source to Firebird 1.5, which is stored in approx. 10
> tables. My load routine obtains the source data and then loops
through it,
> using an ODBC connection to FB to INSERT data into the target tables.
> It has been working very well for some time, however on implementing
it on a
> client's server to run as a scheduled batch process it is showing
signs of
> failure.
> What I'm seeing is that after getting through about 75% of the data
load, it
> appears to go into some continual loop in which it never recovers.
I see
> activity in the FB server computer's hard drive, and continuous
activity on
> the data loading client, but it never gets through the loop. Its very
> strange because I would have expected to see errors coming from FB that
> would interrupt the load, but I don't see this. Its almost like it gets
> stuck in a groove on one random record and never recovers from it.
Yet I do
> not see repetitive rows being posted to FB.

Are there any insert triggers on these tables? Is it possible that
about 75% of the way through, you are hitting a table with a lot of
garbage to be collected - possibly one that a massive delete operation
has formerly been run on?

As a reality check, I would in a test environment backup then restore
the database (so no garbage) and disable any insert triggers.

> In attempting to debug this behavior, I checked my code and noticed
that I
> was going through the data load for the different data source
tables, and
> incrementing a counter for each row posted. After I hit 5,000 rows,
I would
> force a commit to FB.
> What is the best number of rows to post in a transaction before a commit
> should be done? I'm sure that I probably need to commit more
frequently but
> I'm trying to balance performance vs. reliability here. The data
load takes
> about 3 hours to finish, so frequent commits could have dramatic
affect on
> the length of time that this has to run, and as it is run nightly in
> cases, I need to find a way to do this as quickly and reliably as

Strictly speaking, commits should not be made for performance reasons,
but rather for defining the end of an atomic operation. IIRC, Firebird
does use an in memory undo log for faster rollbacks if a client is
still there to request it, but after a certain number of transactions
will drop that structure. Again from memory this is about 5 - 10K
operations. There is however a way to tell Firebird not to create this
in memory structure (gbak does this). I am not sure exactly how that
is done, but may be worth investigating.

A few questions though about your environment.

50 records per second seems rather unimpressive. My 3 year old laptop
can get 6000 records per second without too much effort on my part for
a prepared insert statement (and my databases live in a truecrypt
volume). To insert 500000 records takes under a minute and a half.

Are you doing any slow transformations on each row?
Is the bottleneck on how quickly the ODBC source can give you the data
rather than Firebird itself?
What sort of data? My example was using 3 integer fields, but if you
are pumping in really wide records I could see it being slower.
Are there any indices being built as you import?
Is your import tool being run on the database server or is it pulling
or pushing data over a LAN?
Is the ODBC data source competing for I/O (eg file on the same
physical disk as the Firebird database file).