Subject | RE: [firebird-support] Speed optimization required for DELETE query |
---|---|
Author | Svein Erling Tysvær |
Post date | 2009-09-10T15:04:23Z |
Hi again, Vishal!
NOT EXISTS is superior to NOT IN, in fact, in recent versions of Firebird the optimizer tries to change queries with IN into EXISTS if possible. Why? Because [NOT] EXISTS stops immediately when it encounters a record, [NOT] IN <subselect> calculates the entire subselect for every potential record, in your example that means that the subselect is generated up to 1000 times with lots of intermediate resultsets.
I'm surprised that this query takes much time at all, you only have a handful of records in all of your tables and EXISTS is generally quite quick (it should at least be quicker than your original query). But for optimization, please tell us the PLAN generated and how selective the chosen indexes are and maybe we could find some further optimization.
It is deliberate that I wrote B.dt_AppliedDate >= '03/31/2007' and C.dt_date <= '03/31/2007'. Although I still assume that you actually want B.dt_AppliedDate > '03/31/2007' (and not '>='), that should do the same as your original query did. At least I think that you want to delete records unless there is registered a dt_AppliedDate in April 2007 or later for the current var_Empcode (and var_PGCode).
HTH,
Set
By the way, Set is my initials and my nickname in English communities. Norwegians call me Svein Erling, but that was too complicated when I studied in England.
-----Original Message-----
From: firebird-support@yahoogroups.com [mailto:firebird-support@yahoogroups.com] On Behalf Of Vishal Tiwari
Sent: 10. september 2009 15:31
To: firebird-support@yahoogroups.com
Subject: RE: RE: [firebird-support] Speed optimization required for DELETE query
Hi Set,
I think the query is taking much more time than expectation as i have not seen the query result so far coz its going on... as you sent.
Let me ask you one question, is it really required to use "exists" predicate.
because it returns true or false and not as set of records as "IN" predicate does.
Let me explain the total background.
We have total three table for this DELETE query.
And all the three tableS contain same empcode with more or less number of records.
Say for example we have following employee codes:
"EmpCode001","EmpCode002","EmpCode003","EmpCode004","EmpCode005",
"EmpCode006","EmpCode007","EmpCode008","EmpCode009","EmpCode001".
Now Table_A will contain only 10 records i.e. one record per employee.
Now Table_B (which maintains the history of each employee) contains
total 100 records(say for example) in this table.
Table_C contains total 1000 records(say for example).
One thing is common that all these 10 eployees data is present in all the three tables.
Now, say for example our sub query gives an output as employee codes as follows:
"EmpCode003","EmpCode004","EmpCode005"
Now, our intension is, not to delete these three employees records form the Table_C for the specified period,
i.e. even though these employees records are present for the period i.e. dt_Date <= '03/31/2007', we don't want to delete.
(Exxcept these three employee all employees data shhould be deleted.
After deleetion if we see the records then these three employees data
should be present in the Table_C for the period dt_Date <= '31/03/2009')
Here when we execute the query which you gave, deletes no records.
we were using IN predicate which compares record by record with sub query.
But IN predicate got some limitations with the values, means it must be less than or equal to 1,500 values.
Now what should be the best way to deal with this situation.
Thanks in Advance.
Vishal Tiwari...
NOT EXISTS is superior to NOT IN, in fact, in recent versions of Firebird the optimizer tries to change queries with IN into EXISTS if possible. Why? Because [NOT] EXISTS stops immediately when it encounters a record, [NOT] IN <subselect> calculates the entire subselect for every potential record, in your example that means that the subselect is generated up to 1000 times with lots of intermediate resultsets.
I'm surprised that this query takes much time at all, you only have a handful of records in all of your tables and EXISTS is generally quite quick (it should at least be quicker than your original query). But for optimization, please tell us the PLAN generated and how selective the chosen indexes are and maybe we could find some further optimization.
It is deliberate that I wrote B.dt_AppliedDate >= '03/31/2007' and C.dt_date <= '03/31/2007'. Although I still assume that you actually want B.dt_AppliedDate > '03/31/2007' (and not '>='), that should do the same as your original query did. At least I think that you want to delete records unless there is registered a dt_AppliedDate in April 2007 or later for the current var_Empcode (and var_PGCode).
HTH,
Set
By the way, Set is my initials and my nickname in English communities. Norwegians call me Svein Erling, but that was too complicated when I studied in England.
-----Original Message-----
From: firebird-support@yahoogroups.com [mailto:firebird-support@yahoogroups.com] On Behalf Of Vishal Tiwari
Sent: 10. september 2009 15:31
To: firebird-support@yahoogroups.com
Subject: RE: RE: [firebird-support] Speed optimization required for DELETE query
Hi Set,
I think the query is taking much more time than expectation as i have not seen the query result so far coz its going on... as you sent.
Let me ask you one question, is it really required to use "exists" predicate.
because it returns true or false and not as set of records as "IN" predicate does.
Let me explain the total background.
We have total three table for this DELETE query.
And all the three tableS contain same empcode with more or less number of records.
Say for example we have following employee codes:
"EmpCode001","EmpCode002","EmpCode003","EmpCode004","EmpCode005",
"EmpCode006","EmpCode007","EmpCode008","EmpCode009","EmpCode001".
Now Table_A will contain only 10 records i.e. one record per employee.
Now Table_B (which maintains the history of each employee) contains
total 100 records(say for example) in this table.
Table_C contains total 1000 records(say for example).
One thing is common that all these 10 eployees data is present in all the three tables.
Now, say for example our sub query gives an output as employee codes as follows:
"EmpCode003","EmpCode004","EmpCode005"
Now, our intension is, not to delete these three employees records form the Table_C for the specified period,
i.e. even though these employees records are present for the period i.e. dt_Date <= '03/31/2007', we don't want to delete.
(Exxcept these three employee all employees data shhould be deleted.
After deleetion if we see the records then these three employees data
should be present in the Table_C for the period dt_Date <= '31/03/2009')
Here when we execute the query which you gave, deletes no records.
we were using IN predicate which compares record by record with sub query.
But IN predicate got some limitations with the values, means it must be less than or equal to 1,500 values.
Now what should be the best way to deal with this situation.
Thanks in Advance.
Vishal Tiwari...
--- On Thu, 10/9/09, Vishal Tiwari <vishualsoft@...> wrote:
From: Vishal Tiwari <vishualsoft@...>
Subject: RE: RE: [firebird-support] Speed optimization required for DELETE query
To: firebird-support@yahoogroups.com
Date: Thursday, 10 September, 2009, 6:29 PM
Hi,
plz. check the query once again u wrote
at the end u wrote that
B.dt_AppliedDate >= '03/31/2007'
and in the begining reverse i.e. C.dt_date <= '03/31/2007'
is it okay.
plz. check once again.
regards
Vishal Tiwari...
Delete from Table_C C
where C.dt_date <= '03/31/2007'
And not exists(
select 1 from Table_A A
where A.var_Empcode = C.var_EmpCode and
(A.var_Status = 'I'
or A.var_Resigned_But_Not_Settled = 'Y')
and exists(
select 1 from Table_B B
where A.var_EmpCode = B.var_EmpCode
And A.var_PGCode = B.var_PGCode
and B.dt_AppliedDate >= '03/31/2007' ));
--- On Thu, 10/9/09, Svein Erling Tysvær <svein.erling. tysvaer@kreftreg isteret.no> wrote:
From: Svein Erling Tysvær <svein.erling. tysvaer@kreftreg isteret.no>
Subject: RE: RE: [firebird-support] Speed optimization required for DELETE query
To: "firebird-support@ yahoogroups. com" <firebird-support@ yahoogroups. com>
Date: Thursday, 10 September, 2009, 6:03 PM
Sorry again, your query should not contain doubly negated exists (finally, my brain awoke), the last should simply be exists:
Delete from Table_C C
where C.dt_date <= '03/31/2007'
And not exists(
select 1 from Table_A A
where A.var_Empcode = C.var_EmpCode and
(A.var_Status = 'I'
or A.var_Resigned_ But_Not_Settled = 'Y')
and exists(
select 1 from Table_B B
where A.var_EmpCode = B.var_EmpCode
And A.var_PGCode = B.var_PGCode
and B.dt_AppliedDate >= '03/31/2007' ));
Sorry for messing this up,
Set
-----Original Message-----
From: firebird-support@ yahoogroups. com [mailto:firebird- support@ yahoogroups. com] On Behalf Of Svein Erling Tysvær
Sent: 10. september 2009 14:31
To: firebird-support@ yahoogroups. com
Subject: RE: [firebird-support] Speed optimization required for DELETE query
When did my logic stop working? True and not true has never been true. Let me try again:
EmpCode002: EmpCode003:
True and True and
not exists (True and not exists(True and
not exists (False)) not exists(True) )
=
True and True and
not exists(True and not exists(True and
True) False)
=
True and not exists True and exists
=
Deleted Not deleted
I think this is how it works, although I find it much easier to write in SQL than the pseudocode I wrote above...
Set
-----Original Message-----
From: Svein Erling Tysvær
Sent: 10. september 2009 14:15
To: 'firebird-support@ yahoogroups. com'
Subject: RE: [firebird-support] Speed optimization required for DELETE query
I cannot answer for Helen's query (I'm not used to max without group by and was surprised that it didn't produce a syntax error), but here's a walkthrough of mine for EmpCode002 and EmpCode003:
Delete from Table_C C
where C.dt_date <= '03/31/2007'
/*Is the date early enough? True or both EmpCode002 and EmpCode003 */
And not exists(
select 1 from Table_A A
where A.var_Empcode = C.var_EmpCode and
(A.var_Status = 'I'
or A.var_Resigned_ But_Not_Settled = 'Y')
/*Do they exist with correct values in Table_A? True for both EmpCode002 and EmpCode003 */
and not exists( select 1 from Table_B B where A.var_EmpCode = B.var_EmpCode And A.var_PGCode = B.var_PGCode and B.dt_AppliedDate >= '03/31/2007' ));
/*Do they have a recent AppliedDate? True for EmpCode003, false for EmpCode002*/
So,
EmpCode002: EmpCode003:
True and True and
not (True and not (True and
not (False)) not (True))
=
True and True and
not (True and not (True and
True) False)
=
True and True and
not True not False
=
True False
=
Deleted Not deleted
Set