Subject Query Optimization (Tables or Views)
Author Maurice Ling
Hi all,

I have a table which acts like a term-list defined index

create table jclone_occurrence_pmid(
pmid numeric(10,0) not null,
clone char(128) not null);

There are 18 million rows of records in this table.
My task now is to answer this question: what is the set of pmids that
has clone A and clone B?

So I ran a query:
select pmid from jclone_occurrence_pmid where clone = 'ACT' and pmid
in (select pmid from jclone_occurrence_pmid where clone = 'ACE')

This query took me 2 hours to run. The problem is, I have 13 million
of these queries, means it will take 3000 years to run. clearly
unacceptable.

So, I thought that part of the reason was that the query had to do a
self join (please tell me if I am wrong) which brings me to create
either a view or a cache table. Can anyone please kindly explain the
nature of a view in FB and how it helps?

DDL for the view:
create view jclone_occur_pair_view (clone1, clone2, pmid) as
select a.clone, b.clone, a.pmid
from jclone_occurrence_pmid a, jclone_occurrence_pmid b
where a.pmid=b.pmid;

DDL for the table:
create table jclone_occur_pair(
clone1 char(128) not null,
clone2 char(128) not null,
pmid numeric(10,0) not null);

"Select count(*) from jclone_occur_pair_view" gives me about 478
million rows, which is the same as doing an inner join "select
count(*) from jclone_occurrence_pmid a inner join
jclone_occurrence_pmid b on a.pmid=b.pmid".

But when I try to populate jclone_occur_pair table with the following SQL,
insert into jclone_occur_pair (clone1, clone2, pmid) select a.clone,
b.clone, a.pmid from jclone_occurrence_pmid a, jclone_occurrence_pmid
b where a.pmid=b.pmid;
commit;

My database ballooned from 6GB to 40GB and for some weird reasons,
died... isql just disappeared from process, using about 260MB of
memory (VSIZE in top). I do not eliminate the possibility of killing
the wrong process but unlikely.

So, is there any suggestions to speed up this query?

Thanks and cheers
maurice