Subject Performance of deleting records based on another table
Author Joost van der Sluis
Hi all,

I finally realized today that I do not know how to properly remove
records based on another table.

Suppose I have two tables, one with data (called BIG), and one table
(SMALL) with only some id's that has to be removed from the BIG-table.

I cannot get this to work, without having Firebird looping over all
records in BIG naturally, once for each record in SMALL.

Example: (Firebrid 3.0.4, but also holds on 2.5)

--------------------------
SQL> connect "test.fdb";
Database: "test.fdb", User: JOOST
SQL> create table big(id integer primary key, data varchar(10));
SQL> create table small(id integer primary key);
SQL> insert into big values (1,'adfdvf');
SQL> insert into big values (2,'adfdvf');
SQL> insert into big values (3,'adfdvf');
SQL> insert into big values (4,'adfdvf');
SQL> insert into big values (5,'adfdvf');
SQL> insert into big values (6,'adfdvf');
SQL> insert into big values (7,'adfdvf');
SQL> insert into big values (8,'adfdvf');
SQL> insert into big values (9,'adfdvf');
SQL> insert into big values (10,'adfdvf');
SQL> insert into small values (5);
SQL> set planonly on;
SQL> delete from big where exists (select 1 from small where
small.id=big.id);

PLAN (SMALL INDEX (RDB$PRIMARY2))
PLAN (BIG NATURAL)
--------------------------

Naturally, when BIG is really big, this takes very, very long.

I've tried everything. 'where in' does not help, and I can not force it
to use some other plan either.

One possibility is to use an execute-block. This one is somewhat better
(not really - in case small is not that small):

--------------------------
SQL> set term |;
SQL> execute block as declare variable v integer;
CON> begin
CON> for select id from small into :v do delete from big where id=:v;
CON> end
CON> |

PLAN (BIG INDEX (RDB$PRIMARY1))
PLAN (SMALL NATURAL)
--------------------------

Finally, the only good solution I could find was this:

--------------------------
SQL> merge into big using small on (big.id=small.id) when matched then
update set data='remove';

PLAN JOIN (SMALL NATURAL, BIG INDEX (RDB$PRIMARY1))
SQL> delete from big where data='remove';
--------------------------

Is this really the only efficient way to remove data from one table,
based on another one? So set some temporary flag and then remove based
on that flag?

There must be another way.

Regards,

Joost