Subject Re: [IBO] How to optimize usage of TIB_DataPump?
Author Thomas Steinmaurer
Hi Jason,

see inline. Btw, I'm online on Skype, if you have time.


> Comments inserted below.
>
>> one correction inline ...
>>
>>> I'm trying to get most out of the TIB_DataPump, in a scenario where I'm
>>> running in a 2PC transaction, purging data from a production into a log
>>> database. The data is a master (table: IBLM$OPERATIONLOG) / detail
>>> (table: IBLM$COLUMNLOG) relationship. What I'm currently using is (all
>>> is instantiated at run-time, as I'm in a command-line tool here):
>>>
>>> 1) Single, read-write distributed transaction in tiConcurrency mode
>>> across the two databases. All components etc. are attached to this
>>> transaction.
>>> 2) Source/Destination statement of TIB_DataPump are TIB_Cursor.
>>> 3) Transfer of the master records based on the following query:
>>>
>>> select * from iblm$operationlog order by id asc
>>>
>>> In context of this operation, I'm using TIB_DataPump events to save the
>>> start and end id of processed operation log records in variables for
>>> usage in 4). This way, and in combination with the tiConcurrency
>>> transaction, I can avoid querying detail records per processed master
>>> record, which results in a single detail query, instead of a bunch
>>> detail queries per master record.
>>>
>>> 4) Transfer the detail records based on the following query:
>>>
>>> select * from iblm$columnlog where operationlog_id between :start_id and
>>> :end_id order by id asc
>>>
>>> 5) Remove source database records via:
>>>
>>> delete from iblm$operationlog where id between :start_id and :end_id
>>>
>>> An ON DELETE cascade FK constraint takes care on removing detail records
>>> as well.
>>>
>>> 6) Do a COMMIT of the 2PC transaction
>>>
>>>
>>> I do see the following from the Trace API output:
>>>
>>> a)
>>>
>>> INSERT INTO IBLM$COLUMNLOG
>>> ( ID
>>> , OPERATIONLOG_ID
>>> , COLUMN_NAME
>>> , OLD_VALUE
>>> , NEW_VALUE
>>> , OLD_VALUE_BLOB
>>> , NEW_VALUE_BLOB )
>>> VALUES
>>> ( ? /* ID */
>>> , ? /* OPERATIONLOG_ID */
>>> , ? /* COLUMN_NAME */
>>> , ? /* OLD_VALUE */
>>> , ? /* NEW_VALUE */
>>> , ? /* OLD_VALUE_BLOB */
>>> , ? /* NEW_VALUE_BLOB */ )
>>> RETURNING ID, OPERATIONLOG_ID, COLUMN_NAME, OLD_VALUE, NEW_VALUE,
>>> OLD_VALUE_BLOB, NEW_VALUE_BLOB
>>>
>>>
>>> Is the RETURNING part something TIB_DataPump really needs? I basically
>>> don't need that I guess, especially as there can be BLOB columns
>>> involved as well. So, I need that as light-weight as possible. ;-)
>
> No, it doesn't need this and it should be suppressed to improve performance.

A wild guess is that the RETURNING clause is added, because I feed the
destination statement with:

select * from ... where 0 = 1

?


>>> b)
>>>
>>> I also see re-occuring execution of the detail records SELECT statement
>>> with the same statement handle and transaction id:
>>>
>>> select * from IBLM$COLUMNLOG
>>> where operationlog_id between ? /* olid_start */ and ? /* olid_end */
>>> ORDER BY id ASC
>>>
>>>
>>> Not sure, if I'm doing something wrong here, or if this is a sign that
>>> IBO is fetching additional records from the TIB_Cursor result set?
>
> Perhaps you can get a callstack on this in order to see what it is up to?

That's my issue here. I have to transfer data of a master/detail
relationship in my datapump scenario. The detail/child records are moved
in the DoAfterExecuteItem event handler of the master TIB_Datapump instance.

Perhaps there is a more efficient way to handle master/detail data when
transferring with TIB_DataPump?


>> Re-executing this statement comes straight from my logic and makes
>> perfect sense. Sorry.
>>
>>
>>> Perhaps a FetchAll before executing the pump makes sense?
>
> No, not when you are using TIB_Cursor as the source.
>
>
>>> * Do you see any improvements in the usage of TIB_DataPump here?
>>> * How would you implement an incremental commit thingy? As everything is
>>> bound to the single transaction, even the SELECT statements, I guess I
>>> have to call CommitRetaining, because a hard COMMIT would close my
>>> SELECT cursors as well?
>
> Yes, a hard commit would close your cursors.
>
> This might be the best way to take things in bite-sized chunks anyway.
>
> You do all the records you want for the batch and then commit hard and then
> reopen the cursors and start from where you last left off and so on.

I have been thinking for a while now, but I'm not sure if the trade-off
of re-opening the cursors is larger, especially when working with a
largish master source table, which is queried via:

select * from ... order by pk_field

Assume I have a commit interval of 1000, this leads to
re-executing/opening the ordered result set on the largish master table
(millions of records) every 1000 transferred records.

Perhaps the usage of commit retaining for the commit interval isn't that
evil in that scenario, but the included RETURNING clause is?


Thanks,
Thomas