Subject | RE: [firebird-support] Speed optimization required for DELETE query |
---|---|
Author | Svein Erling Tysvær |
Post date | 2009-09-11T06:58:14Z |
Hi again, Vishal, I tried to answer yesterday, but it seems like I have some problems answering fb-support from my home computer. So here's a copy of what I wrote yesterday evening:
I think I'll have to write about IN <subselect> and EXISTS once more:
IN <subselect> is never quicker than EXISTS, but sometimes it can be equally quick (hence, I think EXISTS is better than IN <subselect> for speed, although I admit that some probably find IN <subselect> more similar to their way of thinking). EXISTS finds whether a record exists, IN <subselect> finds a result set. Both EXISTS and IN are executed once for every potential row, so when you say that the subselect takes 1-2 seconds, you have to multiply those 1-2 seconds by the number of records in C to find the total time the subselect may take when it is in your DELETE query (other conditions may reduce that time a bit, but since it is a correlated subquery, the result set may vary for each row in C). So, your subselect may, at worst, be calculated up to 18000 times and require up to 10 hours if you have 18000 records in C (18000*2 seconds). EXISTS may also have to be executed up to 18000 times, but it only has to return TRUE or FALSE each time, not 8000+ records. And, I'd prefer 18000 intermediate values to 144 million intermediate values (18000*8000)!
Having said this, I want to point out that the Firebird team has put some effort into avoiding this negative aspect of IN <subselect> so in many cases it is now equivalent to EXISTS. I don´t know enough about the engine to tell whether your case is one of these, but I suspect it is not.
As for the dates, you want to use less than with dt_date and greater than for dt_AppliedDate when using EXISTS inside NOT EXISTS.
HTH,
Set
I think I'll have to write about IN <subselect> and EXISTS once more:
IN <subselect> is never quicker than EXISTS, but sometimes it can be equally quick (hence, I think EXISTS is better than IN <subselect> for speed, although I admit that some probably find IN <subselect> more similar to their way of thinking). EXISTS finds whether a record exists, IN <subselect> finds a result set. Both EXISTS and IN are executed once for every potential row, so when you say that the subselect takes 1-2 seconds, you have to multiply those 1-2 seconds by the number of records in C to find the total time the subselect may take when it is in your DELETE query (other conditions may reduce that time a bit, but since it is a correlated subquery, the result set may vary for each row in C). So, your subselect may, at worst, be calculated up to 18000 times and require up to 10 hours if you have 18000 records in C (18000*2 seconds). EXISTS may also have to be executed up to 18000 times, but it only has to return TRUE or FALSE each time, not 8000+ records. And, I'd prefer 18000 intermediate values to 144 million intermediate values (18000*8000)!
Having said this, I want to point out that the Firebird team has put some effort into avoiding this negative aspect of IN <subselect> so in many cases it is now equivalent to EXISTS. I don´t know enough about the engine to tell whether your case is one of these, but I suspect it is not.
As for the dates, you want to use less than with dt_date and greater than for dt_AppliedDate when using EXISTS inside NOT EXISTS.
HTH,
Set
--- In firebird-support@yahoogroups.com, Vishal Tiwari wrote:
> Hi again, Set!
>
> Ya the way IN operator behaves, we want to get all the records and
> don't want to exit as soon as record matches.
>
> I thinks "EXISTS" operator what you are talking is best for search, but when the situation comes to comapre and then delete, that time what exactly the "EXISTS" will play the role.
>
> Here in this DELETE query we don't want to search and exit.
>
> More over we want to delete records except some records because some employees may have status inactive i.e. var_Satus = "I" or some employee might have resigned but they are not settled, so in this case we don't want to delete such employees records.
>
> As you wrote as per following statements, now you have to think and decide if we use "EXISTS" clause with this DELETE query that will give benifit ?
>
> you wrote :
>
> "Because [NOT] EXISTS stops[Vishal: we don't want to staop and exit] immediately when it encounters a record, [NOT] IN <subselect> calculates the entire subselect for every potential record [Vishal: because we want to prevent some records], in your example that means that the subselect is generated up to 1000 times with lots of intermediate resultsets[Vishal: Ya, Set, this is the worst thing and that it is needed and it is taking time, so what should be the best way to deal with this situations, i mean with this DELETE query]."
>
> Let us see how we get the master-blaster way to win this match with DELETE query.
>
> And yes we have to win in any case, at any cost.
>
> I hope you got the topic.
>
> Thanks and regards.
>
> Vishal Tiwari....
>
>
>
> --- On Thu, 10/9/09, Vishal Tiwari <vishualsoft@...> wrote:
>
>
> From: Vishal Tiwari <vishualsoft@...>
> Subject: RE: [firebird-support] Speed optimization required for DELETE query
> To: firebird-support@yahoogroups.com
> Date: Thursday, 10 September, 2009, 9:01 PM
>
>
>
>
>
>
> Hi Set,
>
> It seems to be very nice when talked to you.
>
> Ya, in my previous mail wrote the number of records for
> just an example.
>
> Following are the actual records per table.
>
>
> Table Table_C contains near about 83 thousand records.
> Table Table_A contains near about 18 thousand records.
> Table Table_B contains near about 19 thousand records.
>
> Sub query returns 8066 records withing 1 or 2 seconds.
>
> The idea behind this DELETE query is to purge the data for given period, but with this we don't want to delete some employees data for given period (As i mentioned in the last mail).
>
> With this record sets i given ur query, but that was taking time.
>
> And ha, about date... so it should be i.e. dt_Date <= '03/31/2007'
> i.e. date should be less than or equal to 31 March 2007 for C.dt_Date and also for
> B.dt_AppliedDate. No change in both date condition
>
> data is present for all employees in all the three tables.
>
> Only the employee code, which will be the result of subquery, those employees
> records we don't want to delete, for given condition, rest employees records should be deleted.
>
> Let me know if i failed to explain total background.
>
> Thanks once again.
>
> Vishal Tiwari....