firebird-support - Re: [firebird-support] Query Optimization (Tables or Views)

Subject	Re: [firebird-support] Query Optimization (Tables or Views)
Author	Helen Borrie
Post date	2006-07-13T01:25:05Z

At 10:30 AM 13/07/2006, you wrote:

>Hi all,
>
>I have a table which acts like a term-list defined index
>
>create table jclone_occurrence_pmid(
> pmid numeric(10,0) not null,
> clone char(128) not null);
>
>There are 18 million rows of records in this table.
>My task now is to answer this question: what is the set of pmids that
>has clone A and clone B?
>
>So I ran a query:
>select pmid from jclone_occurrence_pmid where clone = 'ACT' and pmid
>in (select pmid from jclone_occurrence_pmid where clone = 'ACE')
>
>This query took me 2 hours to run. The problem is, I have 13 million
>of these queries, means it will take 3000 years to run. clearly
>unacceptable.
>
>So, I thought that part of the reason was that the query had to do a
>self join (please tell me if I am wrong)

You need either:

select
jcop1.pmid from jclone_occurrence_pmid jcop1
join jclone_occurrence_pmid jcop2
on jcop2.pmid = jcop1.pmid
where jcop2.clone = 'ACE'

or

select
jcop1.pmid from jclone_occurrence_pmid jcop1
where
jcop1.clone = 'ACT'
and exists
(select 1 from jclone_occurrence_pmid jcop2
where jcop2.pmid = jcop1.pmid
and jcop2.clone = 'ACE')

>which brings me to create
>either a view or a cache table. Can anyone please kindly explain the
>nature of a view in FB and how it helps?

A view is just a precompiled query that you can treat (almost) as
though it were a table.

>My database ballooned from 6GB to 40GB and for some weird reasons,
>died...

Yes, a 40 Gb table isn't possible in Firebird < v.2.0. The
ballooning is no doubt due to the fact that your "join syntax" was
actually doing a Cartesian join.

>So, is there any suggestions to speed up this query?

Indexes? You can't expect a query on gazillions of rows to be
subsecond, but you'll get nowhere fast without an index to optimise
the join and the search.

Index both columns separately. Better still from a data integrity
point of view, put a unique constraint across both (with pmid on the
left) and ALSO an index on CLONE.

I also question the wisdom of limiting the valid entries for pmid to
a scale of 10. The engine will store this as BigInt anyway, so
you're not saving any disk space by limiting it this way. Why not
define it as BigInt?

./heLen