Subject Re: Where's my logic flawed?
Author Dmitry Yemanov
03.06.2016 11:27, Svein Erling Tysvær wrote:
> This query:
> WITH TMP( TJENESTEENHETRESHID, TJENESTEENHETLOKAL, IMPORT_DATO ) AS
> ( SELECT TJENESTEENHETRESHID, TJENESTEENHETLOKAL, MAX( IMPORT_DATO )
> FROM NPR
> WHERE TJENESTEENHETRESHID > 0
> AND TJENESTEENHETLOKAL > ''
> GROUP BY 1, 2 ),
> TMP2( TJENESTEENHETRESHID, TJENESTEENHETLOKAL ) AS
> ( SELECT TJENESTEENHETRESHID, TJENESTEENHETLOKAL
> FROM TMP T
> WHERE NOT EXISTS( SELECT * FROM TMP T2
> WHERE T.TJENESTEENHETRESHID = T2.TJENESTEENHETRESHID
> AND T.IMPORT_DATO < T2.IMPORT_DATO ) )
> SELECT TJENESTEENHETRESHID, LIST( TJENESTEENHETLOKAL ) TJENESTEENHETLOKAL
> FROM TMP2
> GROUP BY 1
> HAVING COUNT( DISTINCT TJENESTEENHETLOKAL ) = 1
>
> generates this plan:
> PLAN SORT ((TMP2 T2 NPR NATURAL))
> PLAN SORT ((TMP2 T NPR NATURAL))
>
> NPR is a medium sized table (I'm guessing 20 million rows), but the
> result of the first CTE is only 2200 rows. None of the fields in this
> particular query are indexed. I tried to run the query, but decided to
> terminate it after about 2 hours 40 minutes.
> If I run the first CTE separately
>
> ( SELECT TJENESTEENHETRESHID, TJENESTEENHETLOKAL, MAX( IMPORT_DATO )
> FROM NPR
> WHERE TJENESTEENHETRESHID > 0
> AND TJENESTEENHETLOKAL > ''
> GROUP BY 1, 2 )
>
> it takes just over 16 minutes. Putting the result of this CTE in a
> temporary table and running the rest of the query against this temporary
> table only takes a couple of seconds (reducing the 2200 rows to 1600).
> The way I normally read plans seems to fail me this time - I don't quite
> understand why NPR is mentioned twice in the plan (I assume it has to do
> with my "two-level" GROUP BYs). Can anyone explain why? Or better: Does
> anyone understand why the query takes so long to execute?

It's slow because T2 is accessed naturally (i.e. a fullscan) for the
every record in T. NOT EXISTS is kinda nested loop join (anti-join,
actually), it's good only if the subquery is indexed. Otherwise, it
takes 16 minutes + (2200 * 16 minutes) to execute.

The question is why T2 does not use an index inside CTE. Condition
T.IMPORT_DATO < T2.IMPORT_DATO cannot use an index because it's
evaluated against an aggregate: IMPORT_DATO is actually
MAX(NPR.IMPORT_DATO). However, condition T.TJENESTEENHETRESHID =
T2.TJENESTEENHETRESHID should use an index on NPR.TJENESTEENHETRESHID,
if available. But I doubt it exists, otherwise the local condition
TJENESTEENHETRESHID > 0 would also be indexed in this case.

So the only way to run this query fast is to ensure
NPR.TJENESTEENHETRESHID is indexed (and it's the first segment, if the
index is compound).


Dmitry