Subject | Re: Best Practice Question - How many rows to update before commit? |
---|---|
Author | Adam |
Post date | 2007-11-11T22:44:12Z |
> One other thing that I have noticed is that after about 8-9 of thetables
> being processed, I periodically get a disconnection to the database,losing
> the transaction and cursor and halting further INSERTS (each onefails after
> this time period).This is interesting. If you are getting a disconnection at that point,
you would have a large number of uncommitted records that need to be
rolled back. The server would flag them as rolled back, but at some
stage, the garbage collection would need to occur on these records. If
your import transaction you are running is the first to hit these
records, then they are probably the ones doing this cleanup, so your
performance suffers. If you are using Firebird 1.5 or earlier, and
some of the indices on your table have lots of duplicates, exaggerate
this effect by a hundred times ;).
> I'm not sure if there is a memory leak or something likeforce a
> that going on here with the data load, but I believe I may have to
> disconnect/reconnect for each table to force the connection to beactive and
> free up resources.minimal
>
> The database has been backed up and restored to ensure sweep time is
> here. I have also set FORCE SWEEP settings to be off, so the sweepshould
> be under my control.Good.
>in the
> Now in specific answer to Adam's questions...
>
> > On 10-Nov-2007 07:43:37, firebird-support@yahoogroups.com wrote:
> > Are you doing any slow transformations on each row?
>
> No, there are no triggers or any form of data transforms being done
> INSERT statement.Good, tell us about your insert query. Are you preparing it once and
executing multiple times? If not, I would check that you are not
wasting time re-preparing the same query over and over.
>If you suspect it is, are you able to create a test application that
> > Is the bottleneck on how quickly the ODBC source can give you the data
> > rather than Firebird itself?
>
> I suspect it is. I'll run some tests with different providers to see if
> this is the case.
holds a typical single record in memory, then insert that record
500,000 times. See whether this makes a significant improvement to
make sure you are 'barking up the right tree'.
>can be
> > What sort of data? My example was using 3 integer fields, but if you
> > are pumping in really wide records I could see it being slower.
>
> The data is about 75% text and 25% numeric. Some of the text fields
> very big (in about 50% of all cases there are fields that can be up tosize being
> 30,000 characters in length although its rare to see data of this
> pumped over).I modified my test application, and yes when you start dealing with
records this size, the insert speed drops drastically (to about 70
rows a second on mine). Not really a surprise I suppose. What did
surprise me a little though is that I could only get about 3000 rows a
second when inserting a couple of characters into a large varchar field.
>to see if
> > Are there any indices being built as you import?
>
> No, but I will test with a full index drop before doing the import
> it helps.running
>
> > Is your import tool being run on the database server or is it pulling
> > or pushing data over a LAN?
>
> I have found a major improvement when I push the data over a LAN vs.
> locally. Even with a local connection, howerver, I'm using FBCLIENTwith
> TCP/IP to 127.0.0.1 and never the FBEmbed drivers.This doesn't make a whole lot of sense to me unless you have your
source and destination data sources on the same drive and that is
really hurting I/O.
> > Is the ODBC data source competing for I/O (eg file on the samethe user
> > physical disk as the Firebird database file).
>
> Yes, in some cases it is. I believe on one of the target platforms
> has a RAID array for disk on that server, so how this physically isthe case.
> implemented I'm not sure, but I wouldn't be surprised if that was
What type of RAID? If it is a simple mirror, then it is no better from
a performance point of view (marginally worse actually) than a single
drive.
If you have a bunch of 15K SCSI drives in a RAID 5, it probably isn't
such a bottleneck.
Adam