Subject Re: [firebird-support] Re: Where's my logic flawed?
Author Svein Erling Tysvær
Thanks Dmitry!

I simplified the query to

WITH TMP( TJENESTEENHETRESHID, TJENESTEENHETLOKAL,      IMPORT_DATO ) AS 
( SELECT  TJENESTEENHETRESHID, TJENESTEENHETLOKAL, MAX( IMPORT_DATO ) 
 FROM NPR
 WHERE TJENESTEENHETRESHID > 0
   AND TJENESTEENHETLOKAL  > ''
 GROUP BY 1, 2 )
select t1.tjenesteenhetreshid, t1.tjenesteenhetlokal
from tmp t1
left join tmp t2
       on t1.tjenesteenhetreshid = t2.tjenesteenhetreshid
      and t1.tjenesteenhetlokal <> t2.tjenesteenhetlokal 
      and t1.IMPORT_DATO <= t2.IMPORT_DATO  
where t2.tjenesteenhetreshid is null

But still, I had to terminate this one also after two hours.

None of the fields in this particular query are indexed, they're rarely used in the WHERE clause and have poor selectivity (2K different values in 20M records, it wouldn't surprise me if the most common value may cover 1% of the 20M).

So my question was more: "Why not execute the CTE (against the large table) once, and do the further processing against the (very small) result of the CTE, rather than expand the CTE twice (and thus increase the execution time from 16 minutes to about four weeks)?"

I think I've found a completely different way to solve the particular problem I was trying to solve, so this was more a case of me believing Firebird behaving differently than it does.

Set

2016-06-03 11:25 GMT+02:00 Dmitry Yemanov dimitr@... [firebird-support] <firebird-support@yahoogroups.com>:
03.06.2016 11:27, Svein Erling Tysvær wrote:
> This query:
> WITH TMP( TJENESTEENHETRESHID, TJENESTEENHETLOKAL,      IMPORT_DATO ) AS
> ( SELECT  TJENESTEENHETRESHID, TJENESTEENHETLOKAL, MAX( IMPORT_DATO )
>   FROM NPR
>   WHERE TJENESTEENHETRESHID > 0
>     AND TJENESTEENHETLOKAL  > ''
>   GROUP BY 1, 2 ),
> TMP2(    TJENESTEENHETRESHID, TJENESTEENHETLOKAL ) AS
> ( SELECT TJENESTEENHETRESHID, TJENESTEENHETLOKAL
>    FROM TMP T
>    WHERE NOT EXISTS( SELECT * FROM TMP T2
>                      WHERE T.TJENESTEENHETRESHID = T2.TJENESTEENHETRESHID
>                        AND T.IMPORT_DATO < T2.IMPORT_DATO ) )
> SELECT TJENESTEENHETRESHID, LIST( TJENESTEENHETLOKAL ) TJENESTEENHETLOKAL
> FROM TMP2
> GROUP BY 1
> HAVING COUNT( DISTINCT TJENESTEENHETLOKAL ) = 1
>
> generates this plan:
> PLAN SORT ((TMP2 T2 NPR NATURAL))
> PLAN SORT ((TMP2 T NPR NATURAL))
>
> NPR is a medium sized table (I'm guessing 20 million rows), but the
> result of the first CTE is only 2200 rows. None of the fields in this
> particular query are indexed. I tried to run the query, but decided to
> terminate it after about 2 hours 40 minutes.
> If I run the first CTE separately
>
> ( SELECT  TJENESTEENHETRESHID, TJENESTEENHETLOKAL, MAX( IMPORT_DATO )
>   FROM NPR
>   WHERE TJENESTEENHETRESHID > 0
>     AND TJENESTEENHETLOKAL  > ''
>   GROUP BY 1, 2 )
>
> it takes just over 16 minutes. Putting the result of this CTE in a
> temporary table and running the rest of the query against this temporary
> table only takes a couple of seconds (reducing the 2200 rows to 1600).
> The way I normally read plans seems to fail me this time - I don't quite
> understand why NPR is mentioned twice in the plan (I assume it has to do
> with my "two-level" GROUP BYs). Can anyone explain why? Or better: Does
> anyone understand why the query takes so long to execute?

It's slow because T2 is accessed naturally (i.e. a fullscan) for the
every record in T. NOT EXISTS is kinda nested loop join (anti-join,
actually), it's good only if the subquery is indexed. Otherwise, it
takes 16 minutes + (2200 * 16 minutes) to execute.

The question is why T2 does not use an index inside CTE. Condition
T.IMPORT_DATO < T2.IMPORT_DATO cannot use an index because it's
evaluated against an aggregate: IMPORT_DATO is actually
MAX(NPR.IMPORT_DATO). However, condition T.TJENESTEENHETRESHID =
T2.TJENESTEENHETRESHID should use an index on NPR.TJENESTEENHETRESHID,
if available. But I doubt it exists, otherwise the local condition
TJENESTEENHETRESHID > 0 would also be indexed in this case.

So the only way to run this query fast is to ensure
NPR.TJENESTEENHETRESHID is indexed (and it's the first segment, if the
index is compound).


Dmitry




------------------------------------
Posted by: Dmitry Yemanov <dimitr@...>
------------------------------------

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Visit http://www.firebirdsql.org and click the Documentation item
on the main (top) menu.  Try FAQ and other links from the left-side menu there.

Also search the knowledgebases at http://www.ibphoenix.com/resources/documents/

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
------------------------------------

Yahoo Groups Links

<*> To visit your group on the web, go to:
    http://groups.yahoo.com/group/firebird-support/

<*> Your email settings:
    Individual Email | Traditional

<*> To change settings online go to:
    http://groups.yahoo.com/group/firebird-support/join
    (Yahoo! ID required)

<*> To change settings via email:
    firebird-support-digest@yahoogroups.com
    firebird-support-fullfeatured@yahoogroups.com

<*> To unsubscribe from this group, send an email to:
    firebird-support-unsubscribe@yahoogroups.com

<*> Your use of Yahoo Groups is subject to:
    https://info.yahoo.com/legal/us/yahoo/utos/terms/