Subject Re: How can I skip an Insert ?
Author Adam
Hi Stefan,

Sounds like an interesting project.

> Indeed, the concrete business case can make things quite different. As
> you both went into the details I will provide you more info.
>
> I'm using a GUID as the key, so yes, its not likely to create any
> duplicate entry. That's nothing I worry about. However, as I'm storing
> only unique chess positions which are represented as a string (or
> fixed char) I have to take care that these positions only exist once
> in the database. I call this column FEN (Forsyth-Edwards Notation). I
> have a surrogate key (PositionGUID) and the natural candidate key FEN
> which is an unique index. That's why I'm doing the exist check before
> I create a new chess position entry, only for the FEN.
>
> Important to know: I'm using embedded Firebird and there is only one
> user accessing the database. I'm offering my freeware chess programm
> via download (www.chesspositiontrainer.com) so we don't have to take
> care of any possible concurrency situation. This allows us to think
> about some "special" solutions.
>
> For the moment I figured out the fastest implementation is to use a
> transaction and run a stored procedure which checks whether a position
> exists already (based on the FEN) and if not will add it to the
> database. It returns the PositionGUID. Now, I'm addionally handling an
> arraylist of these GUID's and FEN's in the background. Before I call
> the stored procedure I call Find(FEN) on the arraylist. If I already
> checked the position (based on the FEN) I don't have to run the stored
> procedure anymore. This is a major time safer.

Well if you really required the performance, in this scenario you
could get away with removing the unique constraint if you are testing
for it outside the database. Also the number of queries within the
transaction does have an impact on the performance, and others will be
able to suggest a more optimum inserts between commits figure. You are
already using the fastest win32 FB server type.

>
> I know this is not the standard solution, but to use a find on the
> arraylist costs about 3 seconds for 14.000 queries while it would take
> much more time to do always the query on the database. In my scenario
> the total number of database calls is reduced to 5.000 due to my
> temporarly array list. You can see that the import usually finds many
> duplicates which it has already checked/imported and thus this
> additional arraylist makes sense.
>
> Beside that like I said I'm using one transaction and I don't dispose
> the command object to call the stored procedure. I'm only changing the
> vaues of the parameters. After the import is done I call Commit().
>
> This way I was able to reduce the import down to 40 seconds which
> would take several minutes before. However, one commercial competition
> is doing it within 10 seconds. That's bugging me a little bit, but
> then again I'm now using a true database which doesn't seem to slow
> down with an increasing database and I'm storing more information than
> the competition does.

How much more are you storing? Is it significant enough to explain the
difference.

> Also I'm using transaction and indexes and I
> know that the "competition" database quite often corrupts (the price
> of speed and not using a true database I think).
>

Given that you are the only user, you could disable all the indexes
during the batch of inserts, and re-enable it at the end. Keeping an
index up to date does have a cost, and in this case you may be able to
get around it.

Also, what is your system doing for 40 seconds (is it I/O or CPU bound?)

Adam