Subject Re: Puzzled by choice of plan
Author Svein Erling Tysvær
--- In firebird-support@yahoogroups.com, Alexandre Benson Smith wrote:
> If you are puzzled about query plans what do you expect from others
> ? :-)))

I expect some to be puzzled like me, and some not to be puzzled at all
- simply because they never read plans.

> I think that just Ann and Arno could help here.

Yep, and since the address AnnAndArno (at) firebirdsql.org doesn't
exist, and since I expect others to be interested in the answer as
well, I thought this was the best place to ask the question!

> I am interested on it too, did you checked the same on FB 2.0 ? I
> did some tests and get far better plans with it.

Nope, the database contains sensitive data not to be removed from our
server room. We only recently upgraded our production server to 1.5.2
and the test server still contains some other, less frequently used
databases that are still in use. Hence, simply upgrading that server
to 2.0 would not be too popular. But I could ask Aage if installing 2.
0 besides 1.5.2 on that server will be a problem.

--- In firebird-support@yahoogroups.com, "Ann W. Harrison" wrote:
> Svein Erling Tysvær wrote:
> > I prepared an ad-hoc query similar to:
> >
> > select p.name, d.DateOfDiagnosis
> > from patient p
> > join diagnosis d on d.patientid=p.patientid
> > where d.diagnosis starting '206'
> >
> > To my surprise, the suggested plan was
> > PLAN JOIN(P NATURAL, D INDEX(IDX_DIAGNOSIS, IDX_PATIENT_ID))
> >
> > rather than
> > PLAN JOIN(D INDEX(IDX_DIAGNOSIS), P INDEX(IDX_PATIENT_PK))
> >
> > Now, I had no problem forcing the plan I wanted by adding +0, but
> > I had expected Firebird 1.5.2 (or maybe WI-V6.3.2.4731 is 1.5.3?)
> > to make a better guess.
>
> My guess is that the optimizer assigned a really bad selectivity to
> the "d.diagnosis starting '206'" condition. The selectivity kept in
> 1.5 is based on the whole value of the combined key fields. In 2.0,
> selectivity is kept for each segment of a segmented key, which will
> help some cases, but not yours. If the first three characters of
> the diagnosis are often separately significant, you might consider
> creating a separate indexed field on just those values.

If the optimizer simply had chosen to ignore that index and go
natural, I would have understood your thinking. But why prefer to use
the index for the diagnosis table as the second table in the plan and
use a table not at all referenced in the where clause as the first
table of the plan. How can the optimizer believe the index to be
better when used 1m times as support to a unique index, than when used
once?

The diagnosis field is actually a CHAR(4) field containing digits
between '1400' and '2069'. We add the PK to the index (that PK is not
mentioned in my select) for the field, so the index should be unique.
I did one further check today, and by changing from
where d.diagnosis starting '206', to
where d.diagnosis between '2060' and '2069', the plan changed to what
I would have expected it to be in the first place.

I thought the optimizer believed me to be amongst its friends, but
this times it leaves me simply confused.

Set