firebird-support - Re: [firebird-support] Slow query because an index is not picked up during sort

Subject	Re: [firebird-support] Slow query because an index is not picked up during sort
Author	Alec Swan
Post date	2012-06-03T20:23:26Z

Sean,

The query you proposed will not produce the same result because you
are sorting and topping the join before applying the filter. The
original query was joining and filtering first and after that sorting
and topping. So, your query will always return a subset of rows of the
original query.

The selectivity of IS_DIRTY is 0.5 - a record is either dirty or not.

We have a one-to-one relationship between PHYSICAL_COPY and COPY and
COPY_CLASSIFICATION. So, the solution I went with was to replace INNER
JOINs with LEFT JOINs which changed the execution plan from (I
translated index names):

PLAN SORT (JOIN (COPY INDEX ("IDX_COPY_source_id"), PHYSICAL_COPY
INDEX ("IDX_PHYSICAL_COPY_copy_id"), COPY_CLASSIFICATION INDEX
("IDX_COPY_CLASS_copy_id")))
-- takes about 3 minutes to execute

to

PLAN JOIN (JOIN (PHYSICAL_COPY ORDER "IDX_PHYSICAL_COPY_commit_number"
INDEX ("IDX_PHYSICAL_COPY_commit_number",
"IDX_PHYSICAL_COPY_is_dirty"), COPY INDEX ("PK_COPY_id")),
COPY_CLASSIFICATION INDEX ("IDX_COPY_CLASS_copy_id"))
-- takes about 200ms to execute

This is the second time (see my older post "Force query plan to filter
before join") where LEFT join saved the day. It would be nice to
understand why LEFT join optimization works so much better for us. Is
there an explanation of FB optimization techniques available
somewhere?

Thanks,

Alec

On Sun, Jun 3, 2012 at 12:53 PM, Leyne, Sean <Sean@...> wrote:
>
>
>
> Alex,
>
> > We tracked down the query that generated a 10GB temp file running against
> > a 1.5GB database. Can anybody explain why the query is not using an index
> > on PHYSICAL_COPY."COMMIT_NUMBER"?
>
> Cause the optimizer decided that the "cost" of navigating that index was more than filtering the result set and then performing a sort. The optimizer is concerned about the disk IO cost of a query, as such navigating an index is very expensive since rows are generally stored in index order so, reading each row from different areas of disk is very expensive.
>
> BTW, your post gives us no real details in which to answer your question. We don't know what fields/columns the PLAN refers to -- the hackish names obscure those details. Further, we don't know what the selectivity of the PHYSICAL_COPY."IS_DIRTY" (the index I assume the optimizer selected).
>
> At that said, the fact that the temp file was so much larger than the database size leads me to believe that rather than focus on the query PLAN, you need to re-think the basic query.
>
> Your original select was:
>
> SELECT
> PHYSICAL_COPY."ID", ...
>
>
> FROM PHYSICAL_COPY
> INNER JOIN COPY ON PHYSICAL_COPY."COPY_ID" = COPY."ID"
> INNER JOIN COPY_CLASSIFICATION ON COPY."ID" = COPY_CLASSIFICATION."COPY_ID"
> WHERE
> (PHYSICAL_COPY."IS_DIRTY" = 1) AND
> (COPY."SOURCE_ID" = '123431234') AND
> (PHYSICAL_COPY."COMMIT_NUMBER" >= 10000000)
> ORDER BY
> PHYSICAL_COPY."COMMIT_NUMBER"
>
> If you were to modify the query to read as follows, you would almost completely eliminate the size of the temp file, and significantly increase the performance of the query.
>
> SELECT
> PHYSICAL_COPY."ID", ...
> FROM (
>
> SELECT
> FIRST (1000) PHYSICAL_COPY."ID"
> FROM PHYSICAL_COPY
> INNER JOIN COPY ON PHYSICAL_COPY."COPY_ID" = COPY."ID" ) AbstractTable
>
> WHERE
> (PHYSICAL_COPY."IS_DIRTY" = 1) AND
> (COPY."SOURCE_ID" = '123431234') AND
> (PHYSICAL_COPY."COMMIT_NUMBER" >= 10000000)
> ORDER BY
> PHYSICAL_COPY."COMMIT_NUMBER"
> ) AbstractTable
> INNER JOIN PHYSICAL_COPY on PHYSICAL_COPY."ID" = AbstractTable."ID"
>
> INNER JOIN COPY ON PHYSICAL_COPY."COPY_ID" = COPY."ID"
> INNER JOIN COPY_CLASSIFICATION ON COPY."ID" = COPY_CLASSIFICATION."COPY_ID"
>
> Sean
>
>