Subject RE: [firebird-support] Speed optimization required for DELETE query
Author Svein Erling Tysvær
I was thinking about (note, I did change from 03/01/2007 to 03/31/2007):

Query No. 3:

Delete from Table_C C
Where C.dt_date <= '03/31/2007'
And not exists (
select * from Table_A A
where C.Var_EmpCode = A.var_EmpCode
and (A.var_Status = 'I'
or A.var_Resigned_ But_Not_Settled = 'Y')
and exists (
select * from Table_B B
where A.var_EmpCode = B.var_EmpCode
And A.var_PGCode = B.var_PGCode
And B.dt_AppliedDate > '03/31/2007'));

Using MAX without GROUP BY doesn't sound right and you do need a link to C within the NOT EXISTS, so I'm not surprised query no. 2 didn't yield any result (though I'd expect an error message)...

HTH,
Set

-----Original Message-----
From: firebird-support@yahoogroups.com [mailto:firebird-support@yahoogroups.com] On Behalf Of Vishal Tiwari
Sent: 11. september 2009 14:17
To: firebird-support@yahoogroups.com
Subject: RE: [firebird-support] Speed optimization required for DELETE query

Hi Set,

I was talking about the queries i worked with

Following query no. 1 deletes certain records (using IN predicate),
where as query no. 2 deletes no records (using EXISTS predicate).
Was I wrong with sql made using EXISTS predicate ?

Query No.1:

Delete from Table_C C
where
C.dt_date <= '03/01/2007'
And
C.var_EmpCode not In (
select var_EmpCode from Table_A A
join Table_B B
On
A.var_EmpCode = B.var_EmpCode
And
A.var_PGCode = B.var_PGCode
And
(A.var_Status = 'I'
or
A.var_Resigned_ But_Not_Settled = 'Y') )
and Max(B.dt_AppliedDat e) <= '03/31/2007');

Query No.2:

Delete from Table_C C
where
C.dt_date <= '03/01/2007'
And
not exists (
select var_EmpCode from Table_A A
join Table_B B
On
A.var_EmpCode = B.var_EmpCode
And
A.var_PGCode = B.var_PGCode
And
(A.var_Status = 'I'
or
A.var_Resigned_ But_Not_Settled = 'Y') )
and Max(B.dt_AppliedDat e) <= '03/31/2007');

--- On Fri, 11/9/09, Svein Erling Tysvær <svein.erling.tysvaer@...> wrote:


From: Svein Erling Tysvær <svein.erling.tysvaer@...>
Subject: RE: [firebird-support] Speed optimization required for DELETE query
To: "'firebird-support@yahoogroups.com'" <firebird-support@yahoogroups.com>
Date: Friday, 11 September, 2009, 4:21 PM






Hi, Vishal!

[NOT] EXISTS will perform its check for all 18000 records in C, it just does not care to find all 8000 matches in A and B. I'm surprised that the result was different when using IN and EXISTS - admittedly, I did a mistake when I first used two NOT EXISTS, but the query with NOT EXISTS...EXISTS. .. that I wrote a bit later ought to have given the same result as your IN query.

Maybe I'll find some spare time this evening to look at the query again (and maybe even try to create some tables and see the result) to see for myself whether there is a difference between your original SQL and my modified version. But I still wonder about the PLAN your and my version produces, if I produce test data myself, odds are that I do not manage to duplicate the PLAN on your system (due to different selectivity for various fields).

Set

-----Original Message-----
From: firebird-support@ yahoogroups. com [mailto:firebird-support@ yahoogroups. com] On Behalf Of Vishal Tiwari
Sent: 11. september 2009 12:19
To: firebird-support@ yahoogroups. com
Subject: RE: [firebird-support] Speed optimization required for DELETE query

Hi again, Set!

EXISTS finds whether a record exists, agree but it checks for all the records or if it finds first mach then it exits.

Because with small dataset i tried the result set to be delete with IN prdicate was correct but with exists it was zero, that why i put first len about exists.

regards

Vishal Tiwari.

--- On Fri, 11/9/09, Svein Erling Tysvær <svein.erling. tysvaer@kreftreg isteret.no> wrote:

From: Svein Erling Tysvær <svein.erling. tysvaer@kreftreg isteret.no>
Subject: RE: [firebird-support] Speed optimization required for DELETE query
To: "firebird-support@ yahoogroups. com" <firebird-support@ yahoogroups. com>
Date: Friday, 11 September, 2009, 12:28 PM

Hi again, Vishal, I tried to answer yesterday, but it seems like I have some problems answering fb-support from my home computer. So here's a copy of what I wrote yesterday evening:

I think I'll have to write about IN <subselect> and EXISTS once more:

IN <subselect> is never quicker than EXISTS, but sometimes it can be equally quick (hence, I think EXISTS is better than IN <subselect> for speed, although I admit that some probably find IN <subselect> more similar to their way of thinking). EXISTS finds whether a record exists, IN <subselect> finds a result set. Both EXISTS and IN are executed once for every potential row, so when you say that the subselect takes 1-2 seconds, you have to multiply those 1-2 seconds by the number of records in C to find the total time the subselect may take when it is in your DELETE query (other conditions may reduce that time a bit, but since it is a correlated subquery, the result set may vary for each row in C). So, your subselect may, at worst, be calculated up to 18000 times and require up to 10 hours if you have 18000 records in C (18000*2 seconds). EXISTS may also have to be executed up to 18000 times, but it only has to return TRUE or FALSE each time, not 8000+
records. And, I'd prefer 18000 intermediate values to 144 million intermediate values (18000*8000) !

Having said this, I want to point out that the Firebird team has put some effort into avoiding this negative aspect of IN <subselect> so in many cases it is now equivalent to EXISTS. I don´t know enough about the engine to tell whether your case is one of these, but I suspect it is not.

As for the dates, you want to use less than with dt_date and greater than for dt_AppliedDate when using EXISTS inside NOT EXISTS.

HTH,
Set

--- In firebird-support@ yahoogroups. com, Vishal Tiwari wrote:
> Hi again, Set!
>
> Ya the way IN operator behaves, we want to get all the records and
> don't want to exit as soon as record matches.
>
> I thinks "EXISTS" operator what you are talking is best for search, but when the situation comes to comapre and then delete, that time what exactly the "EXISTS" will play the role.
>
> Here in this DELETE query we don't want to search and exit.
>
> More over we want to delete records except some records because some employees may have status inactive i.e. var_Satus = "I" or some employee might have resigned but they are not settled, so in this case we don't want to delete such employees records.
>
> As you wrote as per following statements, now you have to think and decide if we use "EXISTS" clause with this DELETE query that will give benifit ?
>
> you wrote :
>
> "Because [NOT] EXISTS stops[Vishal: we don't want to staop and exit] immediately when it encounters a record, [NOT] IN <subselect> calculates the entire subselect for every potential record [Vishal: because we want to prevent some records], in your example that means that the subselect is generated up to 1000 times with lots of intermediate resultsets[Vishal: Ya, Set, this is the worst thing and that it is needed and it is taking time, so what should be the best way to deal with this situations, i mean with this DELETE query]."
>
> Let us see how we get the master-blaster way to win this match with DELETE query.
>
> And yes we have to win in any case, at any cost.
>
> I hope you got the topic.
>
> Thanks and regards.
>
> Vishal Tiwari....
>
>
>
> --- On Thu, 10/9/09, Vishal Tiwari <vishualsoft@ ...> wrote:
>
>
> From: Vishal Tiwari <vishualsoft@ ...>
> Subject: RE: [firebird-support] Speed optimization required for DELETE query
> To: firebird-support@ yahoogroups. com
> Date: Thursday, 10 September, 2009, 9:01 PM
>
>
>
>
>
>
> Hi Set,
>
> It seems to be very nice when talked to you.
>
> Ya, in my previous mail wrote the number of records for
> just an example.
>
> Following are the actual records per table.
>
>
> Table Table_C contains near about 83 thousand records.
> Table Table_A contains near about 18 thousand records.
> Table Table_B contains near about 19 thousand records.
>
> Sub query returns 8066 records withing 1 or 2 seconds.
>
> The idea behind this DELETE query is to purge the data for given period, but with this we don't want to delete some employees data for given period (As i mentioned in the last mail).
>
> With this record sets i given ur query, but that was taking time.
>
> And ha, about date... so it should be i.e. dt_Date <= '03/31/2007'
> i.e. date should be less than or equal to 31 March 2007 for C.dt_Date and also for
> B.dt_AppliedDate. No change in both date condition
>
> data is present for all employees in all the three tables.
>
> Only the employee code, which will be the result of subquery, those employees
> records we don't want to delete, for given condition, rest employees records should be deleted.
>
> Let me know if i failed to explain total background.
>
> Thanks once again.
>
> Vishal Tiwari....

See the Web's breaking stories, chosen by people like you. Check out Yahoo! Buzz. http://in.buzz yahoo.com/

[Non-text portions of this message have been removed]

------------ --------- --------- ------

++++++++++++ +++++++++ +++++++++ +++++++++ +++++++++ +++++++++ +++++++++

Visit http://www.firebird sql.org and click the Resources item
on the main (top) menu. Try Knowledgebase and FAQ links !

Also search the knowledgebases at http://www.ibphoeni x.com

++++++++++++ +++++++++ +++++++++ +++++++++ +++++++++ +++++++++ +++++++++
Yahoo! Groups Links

















See the Web's breaking stories, chosen by people like you. Check out Yahoo! Buzz. http://in.buzz.yahoo.com/

[Non-text portions of this message have been removed]



------------------------------------

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Visit http://www.firebirdsql.org and click the Resources item
on the main (top) menu. Try Knowledgebase and FAQ links !

Also search the knowledgebases at http://www.ibphoenix.com

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Yahoo! Groups Links