Subject Re: [firebird-support] Re: Facing big red brick wall (Optional Values)
Author Arno Brinkman
Hi,

> How come this works? Admittedly, I haven't seen any plan, but it just
> seems strange to repeat the Candidateid for every Attributeid. Is is
> due to using an index on t5.Attributeid and putting this first in the
> plan? Would using
>
> JOIN CandidateAttribute t5 ON (t5.Candidateid=t0.Candidateid
> AND t5.Attributeid+0 IN(3916,1402,784,3461,2184))
>
> or (probably better)
>
> JOIN CandidateAttribute t5 ON (t5.Candidateid=t0.Candidateid+0
> AND t5.Attributeid IN(3916,1402,784,3461,2184))
>
> have a similar effect?

No this doesn't have the same effect.

The reason why i repeated 1 value was because i assumed he had an multi-segment index on CandidateID
and AttributeID. Firebird 1.5 (and lower versions) are not able to "re-use" and nodes with or nodes.
This meant that an index will be used for AttributeID and an index for every CandidateID (when index
available ofcourse). When you "group" the expressions together the optimizer can apply both
expressions to 1 multi-segment index.

SELECT
*
FROM
TableA a
JOIN TableB b ON (b.ID = a.ID)
WHERE
a.ID >= 5 and
b.SUBID IN (1, 2)

Assume there's an index IDX_B_ID on TableB(ID), IDX_B_SUBID on TableB(SUBID) and
IDX_B_ID_SUBID on TableB(ID, SUBID).

The statement above will internally in Firebird be seen as:
SELECT
*
FROM
TableA a
JOIN TableB b ON (b.ID = a.ID)
WHERE
a.ID >= 5 and
b.SUBID = 1 OR
b.SUBID = 2

The optimizer will see these nodes:
1 (b.ID = a.ID)
2 (a.ID >= 5)
3 (b.SUBID = 1 OR b.SUBID = 2)

As you see the OR is seen as 1 node, which is normal because the internal conditions of the OR may
not be aplied separated. To use an index for conditions inside a OR every condition should be able
to use an index.

When the join-order is decided for every stream (table) the optimizer will first try to match the
AND nodes (1 and 2) against the available indexes.
Assume TableA was the first in the join order then IDX_B_ID will be matched with node 1 and the node
is marked as used.
One of the next steps will be processing the OR nodes and try to use indexes there. This step
doesn't try to match the AND nodes (1 and 2) too and thus IDX_B_SUBID will be used twice.

When we "group" the OR nodes:
SELECT
*
FROM
TableA a
JOIN TableB b ON (
(b.ID = a.ID and b.SUBID = 1) OR
(b.ID = a.ID and b.SUBID = 2))
WHERE
a.ID >= 5

The optimizer will see these nodes:
1 (a.ID >= 5)
2 ((b.ID = a.ID and b.SUBID = 1) OR (b.ID = a.ID and b.SUBID = 2))

Now the first step matching AND nodes against indexes (for stream TableB) will do nothing because
there aren't nodes.
The OR step can handle node 2 and can match two nodes to an index. This for the right and left side
of the OR.

This was a simplification of what's inside going on, but i hope you understand it better now.

Regards,
Arno Brinkman
ABVisie

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
General database development support:
http://www.databasedevelopmentforum.com

Firebird open source database (based on IB-OE) with many SQL-99 features :
http://www.firebirdsql.org
http://www.firebirdsql.info
http://www.fingerbird.de/
http://www.comunidade-firebird.org/

Support list for Firebird and Interbase users :
firebird-support@yahoogroups.com

Nederlandse firebird nieuwsgroep :
news://newsgroups.firebirdsql.info