Subject | Re: [firebird-support] Force query plan to filter before join |
---|---|
Author | Alec Swan |
Post date | 2012-02-20T19:55:50Z |
Hello Set,
Your guesswork worked quite well. Your query executes 20 times faster than
my original query! Here are the stats for your query:
PLAN SORT (JOIN (JOIN (JOIN (SORT (TMP P NATURAL), PCD INDEX
(FK_zDTEgB/EMb14zlRjEdzCZw==)), CC INDEX (IDX_epDQN2rI7u0hoaXiWuFXAw==)),
JOIN (C INDEX (PK_ZM6SRonqR8AHSQuCISgvnQ==), PC INDEX
(IDX_AlJS5EmMT9tODQnFqmid0w==))))
527 fetches, 0 marks, 60 reads, 0 writes.
0 inserts, 0 updates, 0 deletes, 138 index, 12 seq.
Delta memory: 61020 bytes.
Total execution time: 0.067s
Here is the query result:
ACME 6
BANANA 1
CODE A 1
PROJECT A 1
PROJECT B 1
Notice that the sum of counts is 10 - this is what I meant by 'filters out
all but 10 joined rows'. If UNASSIGNED was included in the list it would
have had a count of almost 14,000 associated with it!
So, what I wanted to avoid is for the query optimizer to join all tables
first and only then apply the != 'UNASSIGNED' filter. Instead, I wanted the
optimizer to apply the filter to the PROJECT table first, then join (and
find 10 matching rows) and only after that do the aggregation. And that's
what your query is doing I believe.
But I understand that the optimizer is not doing what I want it to do
because it does not know that 99% of rows are associated with project with
'UNASSIGNED' status and will be filtered after the join.
I like using table expressions, but it will be hard to refactor our ORM
tool to use them, so I am wondering if there is a way to force the plan of
the original query to match your query?
Thanks,
Alec
2012/2/19 Svein Erling Tysv�r <svein.erling.tysvaer@...>
Your guesswork worked quite well. Your query executes 20 times faster than
my original query! Here are the stats for your query:
PLAN SORT (JOIN (JOIN (JOIN (SORT (TMP P NATURAL), PCD INDEX
(FK_zDTEgB/EMb14zlRjEdzCZw==)), CC INDEX (IDX_epDQN2rI7u0hoaXiWuFXAw==)),
JOIN (C INDEX (PK_ZM6SRonqR8AHSQuCISgvnQ==), PC INDEX
(IDX_AlJS5EmMT9tODQnFqmid0w==))))
527 fetches, 0 marks, 60 reads, 0 writes.
0 inserts, 0 updates, 0 deletes, 138 index, 12 seq.
Delta memory: 61020 bytes.
Total execution time: 0.067s
Here is the query result:
ACME 6
BANANA 1
CODE A 1
PROJECT A 1
PROJECT B 1
Notice that the sum of counts is 10 - this is what I meant by 'filters out
all but 10 joined rows'. If UNASSIGNED was included in the list it would
have had a count of almost 14,000 associated with it!
So, what I wanted to avoid is for the query optimizer to join all tables
first and only then apply the != 'UNASSIGNED' filter. Instead, I wanted the
optimizer to apply the filter to the PROJECT table first, then join (and
find 10 matching rows) and only after that do the aggregation. And that's
what your query is doing I believe.
But I understand that the optimizer is not doing what I want it to do
because it does not know that 99% of rows are associated with project with
'UNASSIGNED' status and will be filtered after the join.
I like using table expressions, but it will be hard to refactor our ORM
tool to use them, so I am wondering if there is a way to force the plan of
the original query to match your query?
Thanks,
Alec
2012/2/19 Svein Erling Tysv�r <svein.erling.tysvaer@...>
> **[Non-text portions of this message have been removed]
>
>
> >Hello,
> >
> >I need help optimizing the query shown below. PROJECT-related tables
> >contain 12 rows each while COPY-related tables contain 14K rows each.
> >Moreover, PROJECT."ASSIGNMENT_STATUS" != 'UNASSIGNED'
> >filters out all but 10 joined rows.
> >
> >My question is how do I change the plan to force
> >PROJECT."ASSIGNMENT_STATUS" != 'UNASSIGNED' filter to be
> >applied before the join?
>
> Hi Alec!
>
> I don't quite understand what you mean by 'filters out all but 10 joined
> rows', != (not equal) would normally not be able to use any index and
> trying to force that to be applied before any JOIN would of course find all
> that are different from 'UNASSIGNED' (note that NULL is neither equal to or
> different from 'UNASSIGNED') regardless of their connection to any other
> table, and making an index for a field where all but 10 rows have the same
> value would only be useful in very special circumstances. You can of course
> try something like
>
> WITH MyTemp AS
> (SELECT DISTINCT p.ID, p.PRIMARY_PROJECT_CODE
> FROM PROJECT p
> WHERE p.ASSIGNMENT_STATUS != 'UNASSIGNED')
>
> SELECT tmp.PRIMARY_PROJECT_CODE AS COL0, COUNT(pc.ID) AS COL1
> FROM PROJECT_CODE_DESCRIPTOR pcd
> JOIN MyTemp tmp ON pcd.PROJECT_ID = tmp.ID
> JOIN COPY_CLASSIFICATION cc ON pcd.PROJECT_CODE = cc.CLASSIFICATION_CODE
> JOIN COPY c ON cc.COPY_ID = c.ID
> JOIN PHYSICAL_COPY pc ON c.ID = pc.COPY_ID
> WHERE pc.IS_MARKED_DELETED = 0
> AND pc.IS_RECYCLED = 0
> GROUP BY tmp.PRIMARY_PROJECT_CODE
>
> but my guess is that this might be slower than your original query (and
> I'm not certain Firebird would choose the PLAN I'm hoping for, we might
> have to change things), Note that I'm not saying it is impossible to speed
> up your query, we just don't know enough about your tables and how
> selective your indexes are etc. so any suggestion will be more or less
> guesswork.
>
> Set
>
>
>