Subject Re: Batch insert performance
Author Adam
--- In, Richard Wesley <hawkfish@...>
> Hi Sean -
> On Aug 23, 2006, at 14:19, Leyne, Sean wrote:
> >> I am getting about 1e6 rows/hr
> >
> > That works out to 277 rows per second.
> >
> > That not terrible, but I have seen better.
> >
> >
> > Are there any triggers firing for the inserts?
> Nope.
> > How big is an average row? How many fields?
> In the customer example I have there are about 75 fields, about 1/4
> of which are varchar with a total width of about 225. The rest is a
> mix of various wide numeric types (mostly bigint, a few double
> precision and timestamp).

Hi Richard,

Just doing some math on your feedback, each of your rows is at least
625 bytes, and I am not sure whether you have counted the 2 bytes
overhead on each varchar field (which would be an additional 40
bytes), or whether you are using some multi-byte characterset. Each
bigint, double and timestamp is another 8 bytes, and there are other
pieces of information that must be stored against each record for
Firebird internals.

If you had only 4 bigint fields, your record size would be 32 bytes
(plus record overhead), in other words you are looking at a factor of
10-20x storage overhead over the simple table with 4 bigints. That
would take you up to 2.5 - 5K records per second, which depending on
the hardware and configuration may or may not be reasonable.

I am not suggesting that the performance is good or acceptable, nor am
I suggesting that it can not be improved by batching, just pointing
out some facts so that your expectations are reasonable.

> The actual layout of the data is not so important as I am looking for
> a general solution - not just one for this customer. We have
> Firebird embedded in our product for use as a local SQL OLAP engine
> and our typical usage scenario is to take a sample of a user's
> completely arbitrary database and load it into a local FB database
> for off-line analysis.
> >> Would
> >> batching the inserts improve things?
> >
> > How often are you committing the rows?
> Only at the end. I was wondering if updating the frequency would
> improve things. TFB mentions batching in the range 5-20k rows when
> you can't avoid indexing and I was wondering if there were other
> cases where this would apply?

Firebird maintains an undo log in memory in case you want to rollback.
Don't quote me on it, but I think after about 5K operations the server
decides this log is too large and abandons it. Given that this is an
'offline process' and you probably don't have to worry about atomicity
so much, you can probably manipulate your code to commit every so often.

One thing you might try which may buy you better speed is to insert
the data into an 'external table', then run an insert into select from
query to pump it into the real table. This will avoid the round trips
at least.