Subject Re: [firebird-support] Query Optimization (Tables or Views)
Author Ann W. Harrison
Maurice Ling wrote:
>
> create table jclone_occurrence_pmid(
> pmid numeric(10,0) not null,
> clone char(128) not null);

As a general thing using char(128) for variable length
strings will cause you grief.

>
> There are 18 million rows of records in this table.
> My task now is to answer this question: what is the set of pmids that
> has clone A and clone B?
>
> So I ran a query:
> select pmid from jclone_occurrence_pmid
> where clone = 'ACT'
> and pmid in
> (select pmid from jclone_occurrence_pmid
> where clone = 'ACE')
>

If all you want is a list of the pmid's with more than one clone,
sorting and grouping the table will be faster than examining all
the rows in nested loops - it's a Kn*Kn vs KnLogn problem. The
value of K is important, but at some point as n grows, the n squared
vs nLogn dominates.

select pmid from jclone_occurrence_pmid
group by pmid
having count (*) > 1

If you need a list of all the clones for each of these pmid's, I'd
define an index on pmid and a stored procedure like this

create procedure find_dups
returns (Ppmid numeric (10,0), Pclone varchar (128);
as begin
for select pmid from jclone_occurrence_pmid
group by pmid
having count (*) > 1
into :Ppmid
do begin
for select clone from jclone_occurrence_pmid
where pmid = :Ppmid
into :Pclone
do begin
suspend;
end;
end;
end;



In Firebird 2, you could use the grouping select as a derived
table, and join it to the base table, but I'd worry some about
the optimization of that query. The stored procedure above
guarantees that the grouping is done in the outer loop. No
offense intended toward the Arno or anyone working on the
optimizer - aggregated derived tables are a hard case.

Regards,


Ann