Subject Re: [firebird-support] delete query optimization question
Author Helen Borrie
At 09:40 AM 10/09/2004 -0700, you wrote:
>Hello all,
>
>So, after having 'set plan on' pointed out (and finding 'set stats on'),
>I'm trying to figure out the following situation.
>
>I have two tables, RESULTS and MEMO, where you can have many memo's per
>result. There's a before delete and after delete trigger on RESULTS,
>the before delete trigger removes any memos that reference the result
>that's about to be deleted.
>
>The results table has ~500,000 records, ~280,000 of which are due to a
>runaway script dumping junk into it. All the junk records have a 'ver'
>column of 1.0.0.126
>
>select count(result_id) from results where ver = '1.0.0.126' gives
>PLAN (RESULTS INDEX (RESULTS_VER))
>Elapsed time= 9.15 sec
>
>I haven't run "delete from results where ver = '1.0.0.126'" long enough
>to finish, but its went over 8 hours before I had to stop it for
>unrelated reasons.
>
>The following query
>'delete from results where result_id in (select first 10 result_id from
>results where ver = '1.0.0.126');'
>
>gives
>
>PLAN (RESULTS INDEX (RDB$PRIMARY5,RESULTS_VER))
>PLAN (RESULTS NATURAL)
>
>and has been running for ~10-20 minutes so far. Which plan refers to
>which part?

The first plan refers to the subquery which is using the indexes on the
primary key (OK) and the version (not OK) for the search. The version is
"not OK" because more than half of the nodes are in the same dup chain.

>Any tips about how to improve this?

You bet. There are a lot of things wrong with this...and it is going to be
pretty slow whatever you do, because of that long chain on the FK of the
dependent table. So disable the index on the version number for this
exercise. (ALTER INDEX RESULTS_VER SET INACTIVE).

First, re-entrant subqueries must be aliased. So, if your delete statement
had been OK (which it's not) it should have been:

'delete from results r1
where r1.result_id in (select first 10 r2.result_id from
results r2 where r2.ver = '1.0.0.126');'


>Should I drop the
>triggers, delete all the memos with a by hand query (delete from memo
>where result_id in (select result_id from results where ver =
>'1.0.0.126') ?), delete the results, then add the triggers again?

Hard to say, since you didn't show us the triggers. In the normal run of
things, you shouldn't need delete triggers, since ON DELETE CASCADE in the
foreign key declaration already creates the required triggers. I think if
you have both a cascade and a Before Delete trigger both hitting the
dependent table you have more than one "galaxy" of recursion going on.

However, using a re-entrant subquery of any sort on this mass-delete is
likely to be causing recursion. I assume you thought you needed this
SELECT FIRST 10 in order to "batch" the deletes in groups of 10. Oh dear
no. Do mass-deletes in batches, yes. Do it using SELECT FIRST, no.

I'd rethink the whole thing.

1. Remove any trigger code that hits the dependent records and use a
cascade action in the FK definition instead.
2. Control the sizes of the batches by submitting a regular DELETE
statement from an executable stored procedure with a counter loop; or just
from a client app with a counter loop (much slower). Perform a hard commit
at the end of each batch of n deletions and then do a sweep when all is done.
3. A batch of 10 is overly conservative. See how you go with a batch size
of 5000-10,000.

./heLen





>Thanks for any help,
>-emile
>
>
>+----------------------------------------------------------------------
>How do I set my LaserPrinter to "Stun"?!
>+----------------------------------------------------------------------
>
>
>
>
>
>Yahoo! Groups Links
>
>
>
>