Subject Re: Query Optimization (Tables or Views)
Author Maurice Ling
> >
> > I have a table which acts like a term-list defined index
> >
> > create table jclone_occurrence_pmid(
> > pmid numeric(10,0) not null,
> > clone char(128) not null);
> >
> > There are 18 million rows of records in this table.
> > My task now is to answer this question: what is the set of pmids that
> > has clone A and clone B?
> >
> > So I ran a query:
> > select pmid from jclone_occurrence_pmid where clone = 'ACT' and pmid
> > in (select pmid from jclone_occurrence_pmid where clone = 'ACE')
>
> What PLAN is the query using?
>
> I seriously hope you have an index on clone, preferably one that is
> combined with pmid afterwards to assist garbage collection in FB 1.5
> or earlier.
>
> CREATE INDEX ix_Clone on jclone_occurrence_pmid(clone, pmid);
>
> If you do not have such an index, then your query must do a complete
> table scan in the outer query, and then a complete table scan for each
> record in the subquery (ie 18million*18million + 1 reads - not pretty).
>

Thanks Adam.

I've got an index:
create descending index jclone_occur_pmid_clone on
jclone_occurrence_pmid (clone);

> You haven't mentioned the Firebird version, but pre 1.5 EXISTS
> performs a lot faster than IN, so you could rewrite it using EXISTS.
>

I'm using FB 1.5.2 on Mac OSX and "exist" does not exist.

> >
> > This query took me 2 hours to run. The problem is, I have 13 million
> > of these queries, means it will take 3000 years to run. clearly
> > unacceptable.
> >
> > So, I thought that part of the reason was that the query had to do a
> > self join (please tell me if I am wrong) which brings me to create
> > either a view or a cache table. Can anyone please kindly explain the
> > nature of a view in FB and how it helps?
> >
> > DDL for the view:
> > create view jclone_occur_pair_view (clone1, clone2, pmid) as
> > select a.clone, b.clone, a.pmid
> > from jclone_occurrence_pmid a, jclone_occurrence_pmid b
> > where a.pmid=b.pmid;
> >
> > DDL for the table:
> > create table jclone_occur_pair(
> > clone1 char(128) not null,
> > clone2 char(128) not null,
> > pmid numeric(10,0) not null);
> >
> > "Select count(*) from jclone_occur_pair_view" gives me about 478
> > million rows, which is the same as doing an inner join "select
> > count(*) from jclone_occurrence_pmid a inner join
> > jclone_occurrence_pmid b on a.pmid=b.pmid".
> >
> > But when I try to populate jclone_occur_pair table with the
> following SQL,
> > insert into jclone_occur_pair (clone1, clone2, pmid) select a.clone,
> > b.clone, a.pmid from jclone_occurrence_pmid a, jclone_occurrence_pmid
> > b where a.pmid=b.pmid;
> > commit;
>
> There is a known bug in Firebird with:
>
> insert into table1 (ID) select ID from table1;
>
> Which goes into an infinite loop. Your insert value selected from a
> view is a complex example of the above query.

I think you are confused here.

I don't see how is my insert a complex form of the bug. I can see why
"insert into table1 (ID) select ID from table1;" goes into infinite
loop. It tries to re-insert the inserted rows into the same table, and
tries to re-insert the inserted rows into the same table...... But my
sql is doing a self join and then insert the results into another table.

I've tested if my sql goes into infinite loop by doing the following:
select count(*) from jclone_occurrence_pmid a, jclone_occurrence_pmid
b on a.pmid=b.pmid;
And it gave me a result (about 478 million).

>
> >
> > My database ballooned from 6GB to 40GB and for some weird reasons,
> > died... isql just disappeared from process, using about 260MB of
> > memory (VSIZE in top). I do not eliminate the possibility of killing
> > the wrong process but unlikely.
> >
> > So, is there any suggestions to speed up this query?
> >
>
> Not without looking at your PLAN statement. In iSQL type
>
> SET PLANONLY;
>
> Then type in your query.
>


select count(pmid) from jclone_occurrence_pmid where clone = 'ACT' and
pmid in (select pmid from jclone_occurrence_pmid where clone = 'ACE');

PLAN (JCLONE_OCCURRENCE_PMID INDEX (JCLONE_OCCUR_PMID_CLONE))
PLAN (JCLONE_OCCURRENCE_PMID INDEX (JCLONE_OCCUR_PMID_CLONE))

After creating this index:
create index jclone_occur_pmid_clonepmid on
jclone_occurrence_pmid(clone, pmid);

The same query gave me the following plan:
PLAN (JCLONE_OCCURRENCE_PMID INDEX (JCLONE_OCCUR_PMID_CLONEPMID))
PLAN (JCLONE_OCCURRENCE_PMID INDEX (JCLONE_OCCUR_PMID_CLONE))

Please enlighten me on how to analyze the plan.

Thanks and cheers
Maurice