Subject How to optimize usage of TIB_DataPump?
Author Thomas Steinmaurer
Hello Jason,

I'm trying to get most out of the TIB_DataPump, in a scenario where I'm
running in a 2PC transaction, purging data from a production into a log
database. The data is a master (table: IBLM$OPERATIONLOG) / detail
(table: IBLM$COLUMNLOG) relationship. What I'm currently using is (all
is instantiated at run-time, as I'm in a command-line tool here):

1) Single, read-write distributed transaction in tiConcurrency mode
across the two databases. All components etc. are attached to this
transaction.
2) Source/Destination statement of TIB_DataPump are TIB_Cursor.
3) Transfer of the master records based on the following query:

select * from iblm$operationlog order by id asc

In context of this operation, I'm using TIB_DataPump events to save the
start and end id of processed operation log records in variables for
usage in 4). This way, and in combination with the tiConcurrency
transaction, I can avoid querying detail records per processed master
record, which results in a single detail query, instead of a bunch
detail queries per master record.

4) Transfer the detail records based on the following query:

select * from iblm$columnlog where operationlog_id between :start_id and
:end_id order by id asc

5) Remove source database records via:

delete from iblm$operationlog where id between :start_id and :end_id

An ON DELETE cascade FK constraint takes care on removing detail records
as well.

6) Do a COMMIT of the 2PC transaction


I do see the following from the Trace API output:

a)

INSERT INTO IBLM$COLUMNLOG
( ID
, OPERATIONLOG_ID
, COLUMN_NAME
, OLD_VALUE
, NEW_VALUE
, OLD_VALUE_BLOB
, NEW_VALUE_BLOB )
VALUES
( ? /* ID */
, ? /* OPERATIONLOG_ID */
, ? /* COLUMN_NAME */
, ? /* OLD_VALUE */
, ? /* NEW_VALUE */
, ? /* OLD_VALUE_BLOB */
, ? /* NEW_VALUE_BLOB */ )
RETURNING ID, OPERATIONLOG_ID, COLUMN_NAME, OLD_VALUE, NEW_VALUE,
OLD_VALUE_BLOB, NEW_VALUE_BLOB


Is the RETURNING part something TIB_DataPump really needs? I basically
don't need that I guess, especially as there can be BLOB columns
involved as well. So, I need that as light-weight as possible. ;-)


b)

I also see re-occuring execution of the detail records SELECT statement
with the same statement handle and transaction id:

select * from IBLM$COLUMNLOG
where operationlog_id between ? /* olid_start */ and ? /* olid_end */
ORDER BY id ASC


Not sure, if I'm doing something wrong here, or if this is a sign that
IBO is fetching additional records from the TIB_Cursor result set?
Perhaps a FetchAll before executing the pump makes sense?


* Do you see any improvements in the usage of TIB_DataPump here?
* How would you implement an incremental commit thingy? As everything is
bound to the single transaction, even the SELECT statements, I guess I
have to call CommitRetaining, because a hard COMMIT would close my
SELECT cursors as well?


Any comments, insights are appreciated. ;-)


Thanks!


--
With regards,
Thomas Steinmaurer (^TS^)
Firebird Technology Evangelist

http://www.upscene.com/

Do you care about the future of Firebird? Join the Firebird Foundation:
http://www.firebirdsql.org/en/firebird-foundation/