Subject Re: Query Optimization (Tables or Views)
Author Adam
--- In firebird-support@yahoogroups.com, "Maurice Ling" <beldin79@...>
wrote:
>
> Hi all,
>
> I have a table which acts like a term-list defined index
>
> create table jclone_occurrence_pmid(
> pmid numeric(10,0) not null,
> clone char(128) not null);
>
> There are 18 million rows of records in this table.
> My task now is to answer this question: what is the set of pmids that
> has clone A and clone B?
>
> So I ran a query:
> select pmid from jclone_occurrence_pmid where clone = 'ACT' and pmid
> in (select pmid from jclone_occurrence_pmid where clone = 'ACE')

What PLAN is the query using?

I seriously hope you have an index on clone, preferably one that is
combined with pmid afterwards to assist garbage collection in FB 1.5
or earlier.

CREATE INDEX ix_Clone on jclone_occurrence_pmid(clone, pmid);

If you do not have such an index, then your query must do a complete
table scan in the outer query, and then a complete table scan for each
record in the subquery (ie 18million*18million + 1 reads - not pretty).

You haven't mentioned the Firebird version, but pre 1.5 EXISTS
performs a lot faster than IN, so you could rewrite it using EXISTS.

>
> This query took me 2 hours to run. The problem is, I have 13 million
> of these queries, means it will take 3000 years to run. clearly
> unacceptable.
>
> So, I thought that part of the reason was that the query had to do a
> self join (please tell me if I am wrong) which brings me to create
> either a view or a cache table. Can anyone please kindly explain the
> nature of a view in FB and how it helps?
>
> DDL for the view:
> create view jclone_occur_pair_view (clone1, clone2, pmid) as
> select a.clone, b.clone, a.pmid
> from jclone_occurrence_pmid a, jclone_occurrence_pmid b
> where a.pmid=b.pmid;
>
> DDL for the table:
> create table jclone_occur_pair(
> clone1 char(128) not null,
> clone2 char(128) not null,
> pmid numeric(10,0) not null);
>
> "Select count(*) from jclone_occur_pair_view" gives me about 478
> million rows, which is the same as doing an inner join "select
> count(*) from jclone_occurrence_pmid a inner join
> jclone_occurrence_pmid b on a.pmid=b.pmid".
>
> But when I try to populate jclone_occur_pair table with the
following SQL,
> insert into jclone_occur_pair (clone1, clone2, pmid) select a.clone,
> b.clone, a.pmid from jclone_occurrence_pmid a, jclone_occurrence_pmid
> b where a.pmid=b.pmid;
> commit;

There is a known bug in Firebird with:

insert into table1 (ID) select ID from table1;

Which goes into an infinite loop. Your insert value selected from a
view is a complex example of the above query.

>
> My database ballooned from 6GB to 40GB and for some weird reasons,
> died... isql just disappeared from process, using about 260MB of
> memory (VSIZE in top). I do not eliminate the possibility of killing
> the wrong process but unlikely.
>
> So, is there any suggestions to speed up this query?
>

Not without looking at your PLAN statement. In iSQL type

SET PLANONLY;

Then type in your query.

Adam