Subject Re: [IBO] How to optimize usage of TIB_DataPump?
Author Thomas Steinmaurer
Hi Jason,

>> Comments inserted below.
>>
>>> one correction inline ...
>>>
>>>> I'm trying to get most out of the TIB_DataPump, in a scenario where I'm
>>>> running in a 2PC transaction, purging data from a production into a log
>>>> database. The data is a master (table: IBLM$OPERATIONLOG) / detail
>>>> (table: IBLM$COLUMNLOG) relationship. What I'm currently using is (all
>>>> is instantiated at run-time, as I'm in a command-line tool here):
>>>>
>>>> 1) Single, read-write distributed transaction in tiConcurrency mode
>>>> across the two databases. All components etc. are attached to this
>>>> transaction.
>>>> 2) Source/Destination statement of TIB_DataPump are TIB_Cursor.
>>>> 3) Transfer of the master records based on the following query:
>>>>
>>>> select * from iblm$operationlog order by id asc
>>>>
>>>> In context of this operation, I'm using TIB_DataPump events to save the
>>>> start and end id of processed operation log records in variables for
>>>> usage in 4). This way, and in combination with the tiConcurrency
>>>> transaction, I can avoid querying detail records per processed master
>>>> record, which results in a single detail query, instead of a bunch
>>>> detail queries per master record.
>>>>
>>>> 4) Transfer the detail records based on the following query:
>>>>
>>>> select * from iblm$columnlog where operationlog_id between :start_id and
>>>> :end_id order by id asc
>>>>
>>>> 5) Remove source database records via:
>>>>
>>>> delete from iblm$operationlog where id between :start_id and :end_id
>>>>
>>>> An ON DELETE cascade FK constraint takes care on removing detail records
>>>> as well.
>>>>
>>>> 6) Do a COMMIT of the 2PC transaction
>>>>
>>>>
>>>> I do see the following from the Trace API output:
>>>>
>>>> a)
>>>>
>>>> INSERT INTO IBLM$COLUMNLOG
>>>> ( ID
>>>> , OPERATIONLOG_ID
>>>> , COLUMN_NAME
>>>> , OLD_VALUE
>>>> , NEW_VALUE
>>>> , OLD_VALUE_BLOB
>>>> , NEW_VALUE_BLOB )
>>>> VALUES
>>>> ( ? /* ID */
>>>> , ? /* OPERATIONLOG_ID */
>>>> , ? /* COLUMN_NAME */
>>>> , ? /* OLD_VALUE */
>>>> , ? /* NEW_VALUE */
>>>> , ? /* OLD_VALUE_BLOB */
>>>> , ? /* NEW_VALUE_BLOB */ )
>>>> RETURNING ID, OPERATIONLOG_ID, COLUMN_NAME, OLD_VALUE, NEW_VALUE,
>>>> OLD_VALUE_BLOB, NEW_VALUE_BLOB
>>>>
>>>>
>>>> Is the RETURNING part something TIB_DataPump really needs? I basically
>>>> don't need that I guess, especially as there can be BLOB columns
>>>> involved as well. So, I need that as light-weight as possible. ;-)
>>
>> No, it doesn't need this and it should be suppressed to improve performance.
>>
>>
>>>> b)
>>>>
>>>> I also see re-occuring execution of the detail records SELECT statement
>>>> with the same statement handle and transaction id:
>>>>
>>>> select * from IBLM$COLUMNLOG
>>>> where operationlog_id between ? /* olid_start */ and ? /* olid_end */
>>>> ORDER BY id ASC
>>>>
>>>>
>>>> Not sure, if I'm doing something wrong here, or if this is a sign that
>>>> IBO is fetching additional records from the TIB_Cursor result set?
>>
>> Perhaps you can get a callstack on this in order to see what it is up to?
>>
>>
>>> Re-executing this statement comes straight from my logic and makes
>>> perfect sense. Sorry.
>>>
>>>
>>>> Perhaps a FetchAll before executing the pump makes sense?
>>
>> No, not when you are using TIB_Cursor as the source.
>>
>>
>>>> * Do you see any improvements in the usage of TIB_DataPump here?
>>>> * How would you implement an incremental commit thingy? As everything is
>>>> bound to the single transaction, even the SELECT statements, I guess I
>>>> have to call CommitRetaining, because a hard COMMIT would close my
>>>> SELECT cursors as well?
>>
>> Yes, a hard commit would close your cursors.
>>
>> This might be the best way to take things in bite-sized chunks anyway.
>>
>> You do all the records you want for the batch and then commit hard and then
>> reopen the cursors and start from where you last left off and so on.
>>
>> Perhaps we can do a chat session and I can take a closer look at things.
>
> Beside exception handling in TIB_DataPump (see my other post), I'm also
> stuck in the things discussed above. Let me know when we can have a
> closer look on that (via Skype).

Any ideas? ;-)


Thanks,
Thomas