Subject Re: [firebird-support] Re: Batch insert performance
Author Richard Wesley
Hey Adam -

On Aug 23, 2006, at 16:56, Adam wrote:

> Just doing some math on your feedback, each of your rows is at least
> 625 bytes, and I am not sure whether you have counted the 2 bytes
> overhead on each varchar field (which would be an additional 40
> bytes), or whether you are using some multi-byte characterset. Each
> bigint, double and timestamp is another 8 bytes, and there are other
> pieces of information that must be stored against each record for
> Firebird internals.

I hadn't done the precise math (because this is a general purpose
application), but this brings up another thing I was wondering
about. In another thread there was a discussion of the optimal page
size for OLAP being the maximum page size (at least for Oracle).
Does anyone know whether that is a reasonable idiom for FB? We do a
fair bit of aggregation (e.g. SUM) in some of our queries and that
seems to bog FB down sometimes.

> If you had only 4 bigint fields, your record size would be 32 bytes
> (plus record overhead), in other words you are looking at a factor of
> 10-20x storage overhead over the simple table with 4 bigints. That
> would take you up to 2.5 - 5K records per second, which depending on
> the hardware and configuration may or may not be reasonable.
>
> I am not suggesting that the performance is good or acceptable, nor am
> I suggesting that it can not be improved by batching, just pointing
> out some facts so that your expectations are reasonable.
>
>>
>> The actual layout of the data is not so important as I am looking for
>> a general solution - not just one for this customer. We have
>> Firebird embedded in our product for use as a local SQL OLAP engine
>> and our typical usage scenario is to take a sample of a user's
>> completely arbitrary database and load it into a local FB database
>> for off-line analysis.
>>
>>>> Would
>>>> batching the inserts improve things?
>>>
>>> How often are you committing the rows?
>>
>> Only at the end. I was wondering if updating the frequency would
>> improve things. TFB mentions batching in the range 5-20k rows when
>> you can't avoid indexing and I was wondering if there were other
>> cases where this would apply?
>
> Firebird maintains an undo log in memory in case you want to rollback.
> Don't quote me on it, but I think after about 5K operations the server
> decides this log is too large and abandons it. Given that this is an
> 'offline process' and you probably don't have to worry about atomicity
> so much, you can probably manipulate your code to commit every so
> often.

This is sounding like something worth trying. Do I have to re-
prepare a statement if I call CommitRetaining?

> One thing you might try which may buy you better speed is to insert
> the data into an 'external table', then run an insert into select from
> query to pump it into the real table. This will avoid the round trips
> at least.

Since this is an embedded application, I'm not sure that the process
through the API would be much slower than FB using its own API
internally. But it is an interesting point and has me wondering
about my own internal overhead which involves converting utf-16 to
utf-8 for each string.

Anyway, thanks for the feedback.
________________________________________________________
Richard Wesley Software Engineer
Tableau Software
http://www.tableausoftware.com/ hawkfish
tableausoftware com