Subject RE: [firebird-support] Force query plan to filter before join
Author Svein Erling Tysvær
>So, what I wanted to avoid is for the query optimizer to join all tables first and only
>then apply the != 'UNASSIGNED' filter. Instead, I wanted the optimizer to apply the filter
>to the PROJECT table first, then join (and find 10 matching rows) and only after that do
>the aggregation. And that's what your query is doing I believe.
>
>But I understand that the optimizer is not doing what I want it to do because it does not
>know that 99% of rows are associated with project with 'UNASSIGNED' status and will be
>filtered after the join.
>
>I like using table expressions, but it will be hard to refactor our ORM tool to use them,
>so I am wondering if there is a way to force the plan of the original query to match your
>query?

Well, I don't know whether it is possible to make Firebird use an index with != However, it is of course possible to trick Firebird (although then you're basically saying that you know better than the optimizer). Here's what comes to mind (forget the spelling mistakes, I'm so lazy I normally use DBW and rarely write DDL by hand):

CREATE GENERATOR UniqueNumber;

ALTER TABLE PROJECT
ADD FakeIndexField INTEGER;

CREATE INDEX FakeIndex ON PROJECT(FakeIndexField);

SET TERM ^^ ;
CREATE TRIGGER PopulateFakeIndexField FOR PROJECT BEFORE INSERT OR UPDATE AS
BEGIN
IF (new.ASSIGNMENT_STATUS = 'UNASSIGNED') THEN
BEGIN
new.FakeIndexField = GEN_ID(UniqueNumber, 1);
END
ELSE
BEGIN
new.FakeIndexField = -1;
END
END^^

SET TERM ; ^^

Naturally, you need to set FakeIndexField for existing records (unless you start with an empty table), e.g. like this:

UPDATE PROJECT
SET ASSIGNMENT_STATUS = ASSIGNMENT_STATUS;

Then, when you want to use the index, change your query to use FakeIndexField = -1 rather than ASSIGNMENT_STATUS != 'UNASSIGNED':

SELECT P.PRIMARY_PROJECT_CODE AS COL0,
COUNT(PC.ID) AS COL1
FROM PROJECT_CODE_DESCRIPTOR PCD
JOIN PROJECT P ON PCD.PROJECT_ID = P.ID
JOIN COPY_CLASSIFICATION CC ON PCD.PROJECT_CODE = CC.CLASSIFICATION_CODE
JOIN COPY C ON CC.COPY_ID = C.ID
JOIN PHYSICAL_COPY PC ON C.ID = PC.COPY_ID
WHERE P.FakeIndexField = -1
AND PC.IS_MARKED_DELETED = 0
AND PC.IS_RECYCLED = 0
GROUP BY P.PRIMARY_PROJECT_CODE

I would expect this to suffice, if the plan still seems strange, you might have to add +0 to P.ID in the JOIN above. I must admit I've never before made such an extreme attempt to fool the optimizer (creating a GENERATOR with this one purpose: making the optimizer believe a field with 90% duplicates are virtually unique)!

You may also consider a slightly simpler version that doesn't require FakeIndexField, but just adds the generator to any UNASSIGNED, so that it becomes UNASSIGNED1, UNASSIGNED2 etc. However, I don't recommend this, since it will probably make the optimizer choose the index even when you use WHERE p.ASSIGNMENT_STATUS STARTING 'UNASSIGNED', and you still have to modify your query:

WHERE (p.ASSIGNMENT_STATUS < 'UNASSIGNED'
OR p.ASSIGNMENT_STATUS > 'UNASSIGNED999999999999999999')

HTH,
Set