Subject | Why does the optimizer choose NATURAL for this tiny table? |
---|---|
Author | Svein Erling Tysvær |
Post date | 2014-11-12T13:14:56Z |
Fb 2.5.1
I have an EXECUTE BLOCK statement, for which a small part goes NATURAL where I'd expect it to use an index associated with a UNIQUE CONSTRAINT.
I would expect
JOIN icd10_purres p ON p.icd10 IN (t.icd10, substring(t.icd10 from 1 for 3))
to use an index, maybe something along the lines of:
PLAN JOIN (P INDEX(U_ICD10_PURRES_ICD10, U_ICD10_PURRES_ICD10), ...
However, the optimizer goes NATURAL.
ICD10_PURRES is a tiny table, but I thought 112 records might be enough for an index to be useful Or is the possibility of reducing the number of potential records by 99% not enough to warrant the use of an index? Other tables in the query are considerably bigger (NPR contains 7 million, NPR_KRG_ID half a million and NPR_TILSTAND 10 million).
This is part of the following FOR SELECT:
for with tmp (id_npr, avdnr, reshid, icd10, innskrivingsdato, aar, omsorgsniva, diagnosenr) as
(select n.id_npr, n.tjenesteenhetkode, n.tjenesteenhetreshid,
iif(substring(t.tilstand from 4 for 1) = '.', substring(t.tilstand from 1 for 3)||substring(t.tilstand from 5 for 1), substring(t.tilstand from 4 for 1)),
n.innskrivingsdato, extract(year from n.innskrivingsdato), n.omsorgsniva, t.nr
from npr n
join npr_krg_id kid on n.npr_krg_id = kid.npr_krg_id
join npr_tilstand t on n.ID_NPR = t.ID_NPR
where kid.fnr_kryptert = :fnr_k
and not n.innskrivingsdato between current_date-90 and current_date)
select t.id_npr, t.avdnr, t.reshid, t.icd10, t.innskrivingsdato, t.aar, p.icd10_c, t.diagnosenr, t.omsorgsniva, g.gruppe, i.beskrivelse
from tmp t
join icd10_purres p on p.icd10 in (t.icd10, substring(t.icd10 from 1 for 3))
join icd10_gruppe g on g.icd10 = p.icd10_c and not exists(select * from icd10_gruppe g2 where g2.icd10 = g.icd10 and g.gruppe < g2.gruppe)
left join ICD10 i on t.icd10 = i.kode
into ... do ...
for which the optimizer creates this plan:
PLAN JOIN (JOIN (JOIN (T KID INDEX (U_NPR_KRG_ID_FNRK), T N INDEX (I_NPR_KRG_ID), T T INDEX (I_NPR_TILSTAND_ID_NPR)), JOIN (P NATURAL, G INDEX (I_ICD10_GRUPPE_ICD10))), I INDEX (PK_ICD10))
Eventually, this EXECUTE BLOCK will be turned into a STORED PROCEDURE. Currently, I'm looking at running the procedure maybe half a million times in a loop, later it will be run daily or weekly, but then only on a few thousand records each time. I haven't tried running things yet, but would expect running the EXECUTE BLOCK once to finish quickly, whereas half a million times hopefully would take a few hours and not days.
I'm just puzzled and am asking because I would like to increase my understanding...
Set
I have an EXECUTE BLOCK statement, for which a small part goes NATURAL where I'd expect it to use an index associated with a UNIQUE CONSTRAINT.
I would expect
JOIN icd10_purres p ON p.icd10 IN (t.icd10, substring(t.icd10 from 1 for 3))
to use an index, maybe something along the lines of:
PLAN JOIN (P INDEX(U_ICD10_PURRES_ICD10, U_ICD10_PURRES_ICD10), ...
However, the optimizer goes NATURAL.
ICD10_PURRES is a tiny table, but I thought 112 records might be enough for an index to be useful Or is the possibility of reducing the number of potential records by 99% not enough to warrant the use of an index? Other tables in the query are considerably bigger (NPR contains 7 million, NPR_KRG_ID half a million and NPR_TILSTAND 10 million).
This is part of the following FOR SELECT:
for with tmp (id_npr, avdnr, reshid, icd10, innskrivingsdato, aar, omsorgsniva, diagnosenr) as
(select n.id_npr, n.tjenesteenhetkode, n.tjenesteenhetreshid,
iif(substring(t.tilstand from 4 for 1) = '.', substring(t.tilstand from 1 for 3)||substring(t.tilstand from 5 for 1), substring(t.tilstand from 4 for 1)),
n.innskrivingsdato, extract(year from n.innskrivingsdato), n.omsorgsniva, t.nr
from npr n
join npr_krg_id kid on n.npr_krg_id = kid.npr_krg_id
join npr_tilstand t on n.ID_NPR = t.ID_NPR
where kid.fnr_kryptert = :fnr_k
and not n.innskrivingsdato between current_date-90 and current_date)
select t.id_npr, t.avdnr, t.reshid, t.icd10, t.innskrivingsdato, t.aar, p.icd10_c, t.diagnosenr, t.omsorgsniva, g.gruppe, i.beskrivelse
from tmp t
join icd10_purres p on p.icd10 in (t.icd10, substring(t.icd10 from 1 for 3))
join icd10_gruppe g on g.icd10 = p.icd10_c and not exists(select * from icd10_gruppe g2 where g2.icd10 = g.icd10 and g.gruppe < g2.gruppe)
left join ICD10 i on t.icd10 = i.kode
into ... do ...
for which the optimizer creates this plan:
PLAN JOIN (JOIN (JOIN (T KID INDEX (U_NPR_KRG_ID_FNRK), T N INDEX (I_NPR_KRG_ID), T T INDEX (I_NPR_TILSTAND_ID_NPR)), JOIN (P NATURAL, G INDEX (I_ICD10_GRUPPE_ICD10))), I INDEX (PK_ICD10))
Eventually, this EXECUTE BLOCK will be turned into a STORED PROCEDURE. Currently, I'm looking at running the procedure maybe half a million times in a loop, later it will be run daily or weekly, but then only on a few thousand records each time. I haven't tried running things yet, but would expect running the EXECUTE BLOCK once to finish quickly, whereas half a million times hopefully would take a few hours and not days.
I'm just puzzled and am asking because I would like to increase my understanding...
Set