Subject | [firebird-support] Re: How to update this table? EXPERIMENTAL RESULTS |
---|---|
Author | Tomasz Tyrakowski |
Post date | 2011-10-07T12:11:28Z |
Just to make sure, I've prepared a simple experiment to measure if
sub-selects in the EXISTS clause can behave differently depending on
what and how you select in the sub-query.
If you haven't been following this conversation, please skip the rest of
this letter as it won't probably mean much to you.
Experiment conditions.
Two tables:
create table t1 (
f1 integer not null,
longfield1 char(200) default 'ABCDEFGHIJ',
primary key(f1)
);
create table t2 (
f2 integer not null,
longfield2 char(200) default '0123456789',
primary key(f2)
);
The longfieldxx is added to check if selecting it in the sub-select
matters or not. Char(200) for a table with a few million records should
make a visible difference.
Three flavors of update with EXISTS clause were measured:
[1] (the original query, from Svein's post if I remember correctly)
update t1 a set longfield1='x'
where exists (
select * from t2 b where a.f1 = b.f2
);
[2] (an improvement suggested by Tom)
update t1 a set longfield1='x'
where exists (
select 1 from t2 b where a.f1 = b.f2
);
[3] (a further improvement suggested by me)
update t1 a set longfield1='x'
where exists (
select first(1) 1 from t2 b where a.f1 = b.f2
);
The queries [1]-[3] were executed three times each. After each execution
the transaction was rolled back.
The measurement was made for two different configurations of the
contents of t1 and t2.
[A]
Both t1 and t2 consisted of 2 million records. t1.f1 ranged from 1 to
2000000 and t2.f2 ranged from 2000001 to 4000000. It is easy to see,
that the intersection of t1 and t2 was empty, so the sub-selects in
[1]-[3] were empty.
Execution times (three executions for each query):
[1] 11.6s 11.7s 11.6s
[2] 11.6s 11.6s 11.6s
[3] 12.6s 12.6s 12.5s
[B]
Both t1 and t2 consisted of 2 million records, and both t1.f1 and t2.f2
ranged from 1 to 2000000, so this time the intersection of t1.f1 and
t2.f2 was maximal, i.e. consisted of all records.
Execution times (three executions for each query):
[1] 165s 162s 163s
[2] 162s 162s 162s
[3] 165s 165s 169s
Conclusion: using select 1 ... instead of select * ... in the sub-select
doesn't improve anything, while using select first(1) ... even slows
things down a bit. IMHO, that proves FB handles the sub-selects in an
efficient manner, i.e. doesn't retrieve irrelevant data and exits the
sub-select as soon as the first record arrives.
Sorry for this rather long post, but I wanted to share the results so
that some other people don't have to reinvent the wheel.
regards
Tomasz
--
__--==============================--__
__--== Tomasz Tyrakowski ==--__
__--== SOL-SYSTEM ==--__
__--== http://www.sol-system.pl ==--__
__--==============================--__
sub-selects in the EXISTS clause can behave differently depending on
what and how you select in the sub-query.
If you haven't been following this conversation, please skip the rest of
this letter as it won't probably mean much to you.
Experiment conditions.
Two tables:
create table t1 (
f1 integer not null,
longfield1 char(200) default 'ABCDEFGHIJ',
primary key(f1)
);
create table t2 (
f2 integer not null,
longfield2 char(200) default '0123456789',
primary key(f2)
);
The longfieldxx is added to check if selecting it in the sub-select
matters or not. Char(200) for a table with a few million records should
make a visible difference.
Three flavors of update with EXISTS clause were measured:
[1] (the original query, from Svein's post if I remember correctly)
update t1 a set longfield1='x'
where exists (
select * from t2 b where a.f1 = b.f2
);
[2] (an improvement suggested by Tom)
update t1 a set longfield1='x'
where exists (
select 1 from t2 b where a.f1 = b.f2
);
[3] (a further improvement suggested by me)
update t1 a set longfield1='x'
where exists (
select first(1) 1 from t2 b where a.f1 = b.f2
);
The queries [1]-[3] were executed three times each. After each execution
the transaction was rolled back.
The measurement was made for two different configurations of the
contents of t1 and t2.
[A]
Both t1 and t2 consisted of 2 million records. t1.f1 ranged from 1 to
2000000 and t2.f2 ranged from 2000001 to 4000000. It is easy to see,
that the intersection of t1 and t2 was empty, so the sub-selects in
[1]-[3] were empty.
Execution times (three executions for each query):
[1] 11.6s 11.7s 11.6s
[2] 11.6s 11.6s 11.6s
[3] 12.6s 12.6s 12.5s
[B]
Both t1 and t2 consisted of 2 million records, and both t1.f1 and t2.f2
ranged from 1 to 2000000, so this time the intersection of t1.f1 and
t2.f2 was maximal, i.e. consisted of all records.
Execution times (three executions for each query):
[1] 165s 162s 163s
[2] 162s 162s 162s
[3] 165s 165s 169s
Conclusion: using select 1 ... instead of select * ... in the sub-select
doesn't improve anything, while using select first(1) ... even slows
things down a bit. IMHO, that proves FB handles the sub-selects in an
efficient manner, i.e. doesn't retrieve irrelevant data and exits the
sub-select as soon as the first record arrives.
Sorry for this rather long post, but I wanted to share the results so
that some other people don't have to reinvent the wheel.
regards
Tomasz
--
__--==============================--__
__--== Tomasz Tyrakowski ==--__
__--== SOL-SYSTEM ==--__
__--== http://www.sol-system.pl ==--__
__--==============================--__