Subject | Dealing with inserts from multiple transactions |
---|---|
Author | bobm_dashlogistics |
Post date | 2012-06-07T16:00:30Z |
I have multiple ETL processes that are inserting data at the same time. Some of the tables that this data is being inserted into have FK constraints that must be maintained during the ETL process. See pseudocode below for an example:
create table employee (
employee_id integer,
employee_nbr integer unique constraint,
employee_name varchar(50));
create table sales (
sales_id integer,
employee_id integer references employee.employee_id,
units integer);
create table time_clock (
time_clock_id integer,
employee_id integer references employee.employee_id,
hours integer);
During the ETL process for Sales, we need to verify that the Employee record already exists, and if not then we insert it before writing the Sales record so that we can satisfy the FK. The same process happens during the import of the TimeClock data.
The problem occurs when a new employee appears in both the Sales and TimeClock data and are processed at the same time in different transactions. The process looks like this:
T1: Start transaction
T2: Start transaction
T1: Process Sale record
T1: Verify Employee - does not exist
T1: Insert Employee record
T1: Insert Sale record
T1: Process next Sale record
T2: Process Timeclock record
T2: Verify Employee - does not exist (can't see other transaction)
T2: Insert Employee record - violation of unique constraint
Both transactions are using READ_COMMITTED at this point, which is of course the problem. I can't commit either transaction because the entire Sale or Timeclock dataset must succeed or fail.
Any ideas on how to do this type of process successfully?
Thank you.
create table employee (
employee_id integer,
employee_nbr integer unique constraint,
employee_name varchar(50));
create table sales (
sales_id integer,
employee_id integer references employee.employee_id,
units integer);
create table time_clock (
time_clock_id integer,
employee_id integer references employee.employee_id,
hours integer);
During the ETL process for Sales, we need to verify that the Employee record already exists, and if not then we insert it before writing the Sales record so that we can satisfy the FK. The same process happens during the import of the TimeClock data.
The problem occurs when a new employee appears in both the Sales and TimeClock data and are processed at the same time in different transactions. The process looks like this:
T1: Start transaction
T2: Start transaction
T1: Process Sale record
T1: Verify Employee - does not exist
T1: Insert Employee record
T1: Insert Sale record
T1: Process next Sale record
T2: Process Timeclock record
T2: Verify Employee - does not exist (can't see other transaction)
T2: Insert Employee record - violation of unique constraint
Both transactions are using READ_COMMITTED at this point, which is of course the problem. I can't commit either transaction because the entire Sale or Timeclock dataset must succeed or fail.
Any ideas on how to do this type of process successfully?
Thank you.