firebird-support - Re: [firebird-support] Re: Firebird Usage Load Problem

Subject	Re: [firebird-support] Re: Firebird Usage Load Problem
Author	Maurice Ling
Post date	2005-07-15T14:45:03Z

Daniel Rail wrote:

> Hi,
>
> At July 14, 2005, 20:39, Maurice Ling wrote:
>
> > script A: loads new data into PMC_* tables, process raw text and
> > inserts results into TEXT_REPLACE_ABSTRACT and
> > TEXT_REPLACE_ABSTRACT_UPRO tables. Does that 300 times. Then do a
> > select * from TEXT_REPLACE_ABSTRACT_UPRO, process each record, then
> > insert the results into ML_SVO table and delete the processed record
> > from TEXT_REPLACE_ABSTRACT_UPRO table. [this script is a pure CPU
> > eater. FB process called from this runs at 99.7% without any drop in
> > intensity.]
>
> > What I find interesting is this...... When ran non-concurrently, FB
> > process uses >96% CPU after after about 36-42 hrs (script B and C will
> > take about 10 weeks to run), FB seems to behave itself and no longer
> > using that much CPU. Wonder why? What happened at the beginning?
>
> Firebird is probably trying to do some garbage collection after the
> deletes from the table TEXT_REPLACE_ABSTRACT_UPRO, maybe when the
> script disconnects from the database and before Firebird unloads the
> database from memory. You could try what someone else already
> suggested: set sweep to zero to try to deactivate the garbage
> collection.
>
> Also, I have one question, why are you inserting into
> TEXT_REPLACE_ABSTRACT_UPRO when the data is processed thereafter and
> inserted into another table and then deleted from
> TEXT_REPLACE_ABSTRACT_UPRO? Why not simply process the data
> immediately and directly insert it into ML_SVO? If you can do it that
> way, then it would most likely reduce the CPU usage, since there
> wouldn't be any garbage collection.

Why do I do duplicate inserts is simple. Actually there are triple
inserts, the original data is inserted into PMC_ABSTRACT table and the
1st processed form in TEXT_REPLACE_ABSTRACT and
TEXT_REPLACE_ABSTRACT_UPRO tables. TEXT_REPLACE_ABSTRACT_UPRO table is
more like a cache table. Currently, PMC_ABSTRACT and
TEXT_REPLACE_ABSTRACT tables have 870k records each.

In my program, there are 3 methods. Method A to process all of
PMC_ABSTRACT into TEXT_REPLACE_ABSTRACT. Method B processes all of
TEXT_REPLACE_ABSTRACT into ML_SVO table. On 870k records, Method A will
take 4-5 days to run and Method B will take 6 weeks to run. So Methods A
and B are more like re-building the database from original data. Method
C process TEXT_REPLACE_ABSTRACT_UPRO to ML_SVO.

Method C is called by my daily updating cron script. This scheme of
double data entry and deletion enables me to reprocess any original data
if I do need to, by manually inserting the record in
TEXT_REPLACE_ABSTRACT_UPRO table, and daily processing and updating of
ML_SVO table while keeping both PMC_ABSTRACT and TEXT_REPLACE_ABSTRACT
tables untouched.

You can also see this as a protection mechanism, just in case I screw up
something. My thesis is a research thesis (read: adding tables,
processing methods, and caching the results as I go along) and the last
thing I want is a careless mistake and requires 7 weeks to rebuild my
database. I have to admit that this is not optimal but I will have to go
with that. Sorry.

Thanks.
Cheers
Maurice

[Non-text portions of this message have been removed]