Subject Re: [firebird-support] Re: Firebird Usage Load Problem
Author David Johnson
On Thu, 2005-07-14 at 23:39 +0000, Maurice Ling wrote:
> script A: loads new data into PMC_* tables, process raw text and
> inserts results into TEXT_REPLACE_ABSTRACT and
> TEXT_REPLACE_ABSTRACT_UPRO tables. Does that 300 times. Then do a
> select * from TEXT_REPLACE_ABSTRACT_UPRO, process each record, then
> insert the results into ML_SVO table and delete the processed record
> from TEXT_REPLACE_ABSTRACT_UPRO table. [this script is a pure CPU
> eater. FB process called from this runs at 99.7% without any drop in
> intensity.]
>
I would like to dig into this script a bit.

Which part of this script eats the CPU? What is the timing of each
step?

It appears that the script has these distinct phases (Pseudocode
follows)

repeat 300 times (more or less){
Python: retrieve data from <<source: text files>>
Firebird: Insert data into PMC_* tables
Python: process text
Firebird: insert row(s) into TEXT_REPLACE_ABSTRACT
Firebird: insert row(s) into TEXT_REPLACE_ABSTRACT_UPRO
}

Firebird: select all rows from TEXT_REPLACE_ABSTRACT_UPRO
Python: for every record in result set (870k rows more or less) {

Python: process row text
Firebird: insert row(s) into ML_SVO
Firebird: delete row from TEXT_REPLACE_ABSTRACT_UPRO
}

How long does each part of the script take?
Can the script be multi-threaded?
Where are your commit points?