Subject Re: [firebird-support] sql, joins and plan explanation.
Author Alexandre Benson Smith
Hi Martin !

Martin Dew wrote:

>Hi.
>
>
>
>I have this statement
>
>
>
>select
>
> count(*), o.Name, rt.description
>
>from
>
> Log l
>
> Join Patient p on (l.urn = p.log_urn)
>
> left outer Join Result_types rt on (p.cons_type = rt.code)
>
> left outer Join Organisation o on (p.surgery = o.code)
>
>Where
>
> (l.taken_at >= '01 Apr 2005 00:00:00' and l.Taken_at <= '05 apr 2005
>00:00:00')
>
> and
>
> p.surgery in ('MAY','GREEN','HEATH','HILLV','PIR','STJOH', 'SHEER',
>'SOUTH', 'PARIS', 'SUNNY', 'YORK1', 'COLL', 'WEST1', 'WEST2', 'CHOB')
>
>Group By
>
> o.Name, rt.description
>
>
>
>and the plan comes out as follows;
>
>PLAN SORT (JOIN (JOIN (JOIN (P NATURAL,L INDEX (RDB$PRIMARY28)),RT INDEX
>(RDB$PRIMARY48)),O INDEX (RDB$PRIMARY34)))
>
>
>
>This query takes about 21 seconds to return;
>
>
>
>
>
>If I change the joins to be just normal inner joins (which I do not
>really want as I need to also see record counts for blank cons_types
>values) as below it;
>
>
>
>select
>
> count(*), o.Name, rt.description
>
>from
>
> Log l
>
> Join Patient p on (l.urn = p.log_urn)
>
> Join Result_types rt on (p.cons_type = rt.code)
>
> Join Organisation o on (p.surgery = o.code)
>
>Where
>
> (l.taken_at >= '01 Apr 2005 00:00:00' and l.Taken_at <= '05 apr 2005
>00:00:00')
>
> and
>
> p.surgery in ('MAY','GREEN','HEATH','HILLV','PIR','STJOH', 'SHEER',
>'SOUTH', 'PARIS', 'SUNNY', 'YORK1', 'COLL', 'WEST1', 'WEST2', 'CHOB')
>
>Group By
>
> o.Name, rt.description
>
>
>
>the plan comes out as followsl
>
>PLAN SORT (JOIN (L INDEX (IX_LOG_TAKEN),P INDEX (RDB$PRIMARY37),RT INDEX
>(RDB$PRIMARY48),O INDEX (RDB$PRIMARY34)))
>
>
>
>You will notice the optomiser has used the IX_LOG_TAKEN index here which
>is very quick, and is what I want it to use in the first example, can
>someone tell me why fb is using completely different plans and what I
>would consider ignoring the fastest index.
>
>
>
>Thanks for any help.
>
>
>
>Martin
>
>
>
The optimizer gurus here are Arno, Ann and Set, they have some magic
powers to bring slow queries to the fast world with some simple tricks
(or magic spells).

What I could tell you is:
FB 1.5 and below has some problems to optimize queries with outer joins,
inner joins are generally perfect optimized (I don't remember getting a
bad plan for inner joins). FB 2.0 has a better optimization algorithm, I
had a similar problem some weeks ago. If you could give FB 2.0 a try to
see how it performs.

Now, getting back to your query:

try this one:

select

count(*), o.Name, rt.description

from

Log l

left Join Patient p on (l.urn = p.log_urn)

left outer Join Result_types rt on (p.cons_type = rt.code)

left outer Join Organisation o on (p.surgery = o.code)

Where

(l.taken_at >= '01 Apr 2005 00:00:00' and l.Taken_at <= '05 apr 2005
00:00:00')

and

p.surgery in ('MAY','GREEN','HEATH','HILLV','PIR','STJOH', 'SHEER',
'SOUTH', 'PARIS', 'SUNNY', 'YORK1', 'COLL', 'WEST1', 'WEST2', 'CHOB')

Group By

o.Name, rt.description

the first left join will force Patient Table to be the first scanned, and could have a side beneffit of the use of index on L.taken_at.

another approach is to try to put the "where" clause on the "join on" clause.

other possibility is:

select

count(*), o.Name, rt.description

from

Log l

Join Patient p on (l.urn + 0 = p.log_urn)

left outer Join Result_types rt on (p.cons_type = rt.code)

left outer Join Organisation o on (p.surgery = o.code)

Where

(l.taken_at >= '01 Apr 2005 00:00:00' and l.Taken_at <= '05 apr 2005
00:00:00')

and

p.surgery in ('MAY','GREEN','HEATH','HILLV','PIR','STJOH', 'SHEER',
'SOUTH', 'PARIS', 'SUNNY', 'YORK1', 'COLL', 'WEST1', 'WEST2', 'CHOB')

Group By

o.Name, rt.description




hth

see you !

--

Alexandre Benson Smith
Development
THOR Software e Comercial Ltda.
Santo Andre - Sao Paulo - Brazil
www.thorsoftware.com.br



--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.308 / Virus Database: 266.9.1 - Release Date: 01/04/2005