Subject Re: How can I skip an Insert ?
Author Stefan Renzewitz
--- In firebird-support@yahoogroups.com, "Adam" <s3057043@y...> wrote:
> Hi Stefan,
>
> Sounds like an interesting project.
>

Your replies are as interesting, thanks for the time you are taking to
give me some new inspirations how to optimize my database code!

> Well if you really required the performance, in this scenario you
> could get away with removing the unique constraint if you are
testing
> for it outside the database. Also the number of queries within the
> transaction does have an impact on the performance, and others will
be
> able to suggest a more optimum inserts between commits figure. You
are
> already using the fastest win32 FB server type.
>

I think this should be the last option if everything else failed and
I'm not happy with the performance. Original one of the major reasons
to use a true database was to be 100% sure the data is consistant. To
temporarly disable an index/primary key sounds more like an option for
this special situation.

Your remark about the queries and transaction sounds interesting! I
will try find out a little bit more about this.

> > This way I was able to reduce the import down to 40 seconds which
> > would take several minutes before. However, one commercial
competition
> > is doing it within 10 seconds. That's bugging me a little bit, but
> > then again I'm now using a true database which doesn't seem to
slow
> > down with an increasing database and I'm storing more information
than
> > the competition does.
>
> How much more are you storing? Is it significant enough to explain
the
> difference.

Well, of course I don't know for sure what exactly it is storing, but
I can make some assumptions. First of all, it seems to use a quite old
database system: BTree Filer 5.5x for Pascal/Delphi (http:
//sourceforge.net/projects/tpbtreefiler/). It looks like the program
is storing all white and black positions in a seperated file instead
of one. The same is true for the moves. In contrast I put all
positions into one table (of course?), but added a column
"ColorToMove".

Then I have one more dimension to consider. I allow the user to handle
several (sub-)repertoires in one database. Hence I added the column
and foreign key "RepertoireGUID" to my position table. The other
program has not such a feature so it certainly saves some time here,
but being able to divide a big-repertoire and its positions into
sub-repertoires is a nice feature which the users like quite a lot.

Beside that I'm not really storing much more I believe.

>
> Given that you are the only user, you could disable all the indexes
> during the batch of inserts, and re-enable it at the end. Keeping an
> index up to date does have a cost, and in this case you may be able
to
> get around it.
>

Good idea. I wonder a little bit how much time it will cost to
re-activate the index, but I will give it a try.

> Also, what is your system doing for 40 seconds (is it I/O or CPU
bound?)

Than checking and inserting was seperated it took about 9 seconds for
about 5.000 positions to check whether the position existed already or
not - just the stored procedure and nothing was inserted into the
database yet. Right now the whole database stuff takes about 30
seconds of the 40 overall seconds, 6 seconds for the quick check via
the arraylist (3 seconds for positions and 3 for moves). I have to add
that I'm importing positions and moves, but to keep the focus I left
out the move aspect in this discussion. However, maybe this is also an
issue as I'm inserting into the position and move table alternating
(but within the same transaction).

About 4 seconds are used for the parsing of the import file itself.

Than I had checking and inserting a position seperated it was quite
interesting to see that the 5.000 queries took the same time no matter
whether the database was totally empty or I inserted already the 5.000
positions. This lead for me to the conclusion that 9 seconds are the
absolute minimum for checking whether a position exists. Thus I did
not focus on the primary key or index anymore so much (OK, it is an
issue for inserting of course though). I thought it is the way I
access the database which is the bottleneck. This one is really
bugging me. If it takes already 9 seconds just to check the 5.000
positions whether they exist or not how can the competition be done
with the whole import within the same time? And I didn't check for the
moves yet nor did I insert any rows (not to mention the parsing
itself). I can't imagine the other program is really making a check
exist query. But maybe it is storing everything in a smarter way.

Anyway, we probably shouldn't focus too much on the competition, but
just thinking about some possible optimizations for my database code.
Like I said, the other progam has some other disadvantages and the
import time is important, but not decisive after all. Beside the
import everything else is fast and pretty satisfying.

Stefan