Subject Re: Best Practice Question - How many rows to update before commit?
Author Adam
> One other thing that I have noticed is that after about 8-9 of the
tables
> being processed, I periodically get a disconnection to the database,
losing
> the transaction and cursor and halting further INSERTS (each one
fails after
> this time period).

This is interesting. If you are getting a disconnection at that point,
you would have a large number of uncommitted records that need to be
rolled back. The server would flag them as rolled back, but at some
stage, the garbage collection would need to occur on these records. If
your import transaction you are running is the first to hit these
records, then they are probably the ones doing this cleanup, so your
performance suffers. If you are using Firebird 1.5 or earlier, and
some of the indices on your table have lots of duplicates, exaggerate
this effect by a hundred times ;).


> I'm not sure if there is a memory leak or something like
> that going on here with the data load, but I believe I may have to
force a
> disconnect/reconnect for each table to force the connection to be
active and
> free up resources.
>
> The database has been backed up and restored to ensure sweep time is
minimal
> here. I have also set FORCE SWEEP settings to be off, so the sweep
should
> be under my control.

Good.

>
> Now in specific answer to Adam's questions...
>
> > On 10-Nov-2007 07:43:37, firebird-support@yahoogroups.com wrote:
> > Are you doing any slow transformations on each row?
>
> No, there are no triggers or any form of data transforms being done
in the
> INSERT statement.

Good, tell us about your insert query. Are you preparing it once and
executing multiple times? If not, I would check that you are not
wasting time re-preparing the same query over and over.

>
> > Is the bottleneck on how quickly the ODBC source can give you the data
> > rather than Firebird itself?
>
> I suspect it is. I'll run some tests with different providers to see if
> this is the case.

If you suspect it is, are you able to create a test application that
holds a typical single record in memory, then insert that record
500,000 times. See whether this makes a significant improvement to
make sure you are 'barking up the right tree'.

>
> > What sort of data? My example was using 3 integer fields, but if you
> > are pumping in really wide records I could see it being slower.
>
> The data is about 75% text and 25% numeric. Some of the text fields
can be
> very big (in about 50% of all cases there are fields that can be up to
> 30,000 characters in length although its rare to see data of this
size being
> pumped over).

I modified my test application, and yes when you start dealing with
records this size, the insert speed drops drastically (to about 70
rows a second on mine). Not really a surprise I suppose. What did
surprise me a little though is that I could only get about 3000 rows a
second when inserting a couple of characters into a large varchar field.

>
> > Are there any indices being built as you import?
>
> No, but I will test with a full index drop before doing the import
to see if
> it helps.
>
> > Is your import tool being run on the database server or is it pulling
> > or pushing data over a LAN?
>
> I have found a major improvement when I push the data over a LAN vs.
running
> locally. Even with a local connection, howerver, I'm using FBCLIENT
with
> TCP/IP to 127.0.0.1 and never the FBEmbed drivers.

This doesn't make a whole lot of sense to me unless you have your
source and destination data sources on the same drive and that is
really hurting I/O.

> > Is the ODBC data source competing for I/O (eg file on the same
> > physical disk as the Firebird database file).
>
> Yes, in some cases it is. I believe on one of the target platforms
the user
> has a RAID array for disk on that server, so how this physically is
> implemented I'm not sure, but I wouldn't be surprised if that was
the case.

What type of RAID? If it is a simple mirror, then it is no better from
a performance point of view (marginally worse actually) than a single
drive.

If you have a bunch of 15K SCSI drives in a RAID 5, it probably isn't
such a bottleneck.

Adam