Subject Re: How can I skip an Insert ?
Author Stefan Renzewitz
--- In, "Adam" <s3057043@y...> wrote:
> > I will stop there as the rest is not relevant.
> > YOUR example shows a generator?
> > The generator will NEVER supply the same number to two different
> > transactions, you would always get 5 and 6.
> >
> Of course not, but this is not my problem as I would not implement
> this way. The problem is that Stefan was attempting to avoid unique
> constraints ever causing an exception by not inserting if he
> a duplicate. The weakness in the model as I pointed out is that his
> transaction does not see all the records in the table, and therefore
> he can not run a query within the context of his transaction that
> would identify the duplicate value under all circumstances. The
> preceeding lookup although it would certainly see records committed
> before his transaction started and could tell if he duplicated one
> those values, it could not categorically guarantee it was safe.
> Of course if you use a GUID of some form, then you are never going
> have a problem with a primary key, BUT the important thing to
> is that this is true of ANY unique constraint, not just PKs.
> By the way, we do the exact same thing as you, creating a surrogate
> instead of using the real data. I do not claim to have the only way
> the best fit solution to every problem in this approach. In my
> experience, people tend to fail when they think a one size fits all
> solution exists, and something that works well for one person will
> automatically work well for everyone. The cache table is a good idea
> if you are inserting data in high volume, but would obviously mean
> need to create a mechanism for dealing with the problem when it is
> eventually processed.
> Adam

Thanks to you both for your replies! This discussion is very
interesting and useful for me.

Indeed, the concrete business case can make things quite different. As
you both went into the details I will provide you more info.

I'm using a GUID as the key, so yes, its not likely to create any
duplicate entry. That's nothing I worry about. However, as I'm storing
only unique chess positions which are represented as a string (or
fixed char) I have to take care that these positions only exist once
in the database. I call this column FEN (Forsyth-Edwards Notation). I
have a surrogate key (PositionGUID) and the natural candidate key FEN
which is an unique index. That's why I'm doing the exist check before
I create a new chess position entry, only for the FEN.

Important to know: I'm using embedded Firebird and there is only one
user accessing the database. I'm offering my freeware chess programm
via download ( so we don't have to take
care of any possible concurrency situation. This allows us to think
about some "special" solutions.

For the moment I figured out the fastest implementation is to use a
transaction and run a stored procedure which checks whether a position
exists already (based on the FEN) and if not will add it to the
database. It returns the PositionGUID. Now, I'm addionally handling an
arraylist of these GUID's and FEN's in the background. Before I call
the stored procedure I call Find(FEN) on the arraylist. If I already
checked the position (based on the FEN) I don't have to run the stored
procedure anymore. This is a major time safer.

I know this is not the standard solution, but to use a find on the
arraylist costs about 3 seconds for 14.000 queries while it would take
much more time to do always the query on the database. In my scenario
the total number of database calls is reduced to 5.000 due to my
temporarly array list. You can see that the import usually finds many
duplicates which it has already checked/imported and thus this
additional arraylist makes sense.

Beside that like I said I'm using one transaction and I don't dispose
the command object to call the stored procedure. I'm only changing the
vaues of the parameters. After the import is done I call Commit().

This way I was able to reduce the import down to 40 seconds which
would take several minutes before. However, one commercial competition
is doing it within 10 seconds. That's bugging me a little bit, but
then again I'm now using a true database which doesn't seem to slow
down with an increasing database and I'm storing more information than
the competition does. Also I'm using transaction and indexes and I
know that the "competition" database quite often corrupts (the price
of speed and not using a true database I think).

If anybody has an idea how to improve on this I'm very interested in!

Again thanks for your interesting replies!