Subject Re: [firebird-support] inner join index optimization problem
Author Alexandre Benson Smith
officialz wrote:
> I have an optimization problem where (it seems) I can't force the
> optimizer to use an index for a inner join on FB 2.0.3
>
> CREATE TABLE PROPERTY_VALUE(
> ID Numeric(18,0) NOT NULL,
> PROPERTY_VALUE Varchar(1024) CHARACTER SET OCTETS NOT NULL,
> PRIMARY KEY (ID),
> UNIQUE (PROPERTY_VALUE)
> );
> CREATE TABLE PROPERTY_VALUE_LIST(
> PROPERTY_ID Numeric(18,0) NOT NULL,
> PROPERTY_VALUE_ID Numeric(18,0) NOT NULL,
> COMPONENT_ID Numeric(18,0) NOT NULL
> );
> ALTER TABLE PROPERTY_VALUE_LIST ADD
> FOREIGN KEY (PROPERTY_ID) REFERENCES PROPERTY (ID);
> ALTER TABLE PROPERTY_VALUE_LIST ADD
> FOREIGN KEY (PROPERTY_VALUE_ID) REFERENCES PROPERTY_VALUE (ID);
> ALTER TABLE PROPERTY_VALUE_LIST ADD
> FOREIGN KEY (COMPONENT_ID) REFERENCES COMPONENT (ID);
>
> property_value has 12,000 rows
> property_value_list has 25,000,000
>
> Using the sample query:
>
> SELECT pvl.component_id, pvl.property_id, pv.property_value as
> property_value
> FROM property_value pv, property_value_list pvl
> WHERE pvl.system_id = 8413 AND pvl.test_script_id = 67919 AND
> pv.id=pvl.property_value_id
> ORDER BY pvl.component_id, pvl.property_id, pvl.property_value_id
>

Because you are not filtering on Property_Value.ID :-) So, if the
Property_Value table is the first to be scanned it must go NATURAL

If you permits me to give you an advice... *use SQL-92 join syntax*.
Your query would look like this:

SELECT
pvl.component_id, pvl.property_id,
pv.property_value as property_value
FROM
property_value pv join
property_value_list pvl on (pv.id=pvl.PROPERTY_VALUE_ID)
WHERE
pvl.system_id = 8413 AND
pvl.test_script_id = 67919
ORDER BY
pvl.component_id, pvl.property_id, pvl.property_value_id

I would expected a PLAN like:

PLAN SORT (JOIN (PVL INDEX (RDB$FOREIGN73, RDB$FOREIGN72), PV(RDB$PRIMARY39)))

Are your indices statistics up to date ?

what about if you use this one:
SELECT
pvl.component_id, pvl.property_id,
pv.property_value as property_value
FROM
property_value pv join
property_value_list pvl on (pv.id=pvl.PROPERTY_VALUE_ID+0)
WHERE
pvl.system_id = 8413 AND
pvl.test_script_id = 67919
ORDER BY
pvl.component_id, pvl.property_id, pvl.property_value_id



> this uses the plan PLAN SORT (JOIN (PV NATURAL, PVL INDEX
> (RDB$FOREIGN70, RDB$FOREIGN73, RDB$FOREIGN72)))
>
> I would expect it to use the plan
> PLAN SORT (JOIN (PV INDEX(RDB$PRIMARY39), PVL INDEX (RDB$FOREIGN70,
> RDB$FOREIGN73, RDB$FOREIGN72)))
> where
> RDB$FOREIGN70 = pvl.PROPERTY_VALUE_ID
> RDB$FOREIGN73 = pvl.SYSTEM_ID
> RDB$FOREIGN72 = pvl.TEST_SCRIPT_ID
> RDB$PRIMARY39 = pv.property_value.id
>
> However when i try to force it to use that plan:
> SELECT pvl.component_id, pvl.property_id, pv.property_value as
> property_value
> FROM property_value pv, property_value_list pvl
> WHERE pvl.system_id = 8413 AND pvl.test_script_id = 67919 AND
> pv.id=pvl.property_value_id
> PLAN SORT
> (JOIN (PV INDEX(RDB$PRIMARY39), PVL INDEX (RDB$FOREIGN70,
> RDB$FOREIGN73, RDB$FOREIGN72)))
> ORDER BY pvl.component_id, pvl.property_id, pvl.property_value_id
>
> I get
> Message: isc_dsql_prepare failed
>
> SQL Message : -284
> index cannot be used in the specified plan
>
> Engine Code : 335544642
> Engine Message :
> index RDB$PRIMARY39 cannot be used in the specified plan
>
> I would like to know why it does not want to use the index on the pv.id
>
> Then again, if I use:
> SELECT pvl.component_id, pvl.property_id, pv.property_value as
> property_value
> FROM property_value pv, property_value_list pvl
> WHERE pvl.system_id = 8413 AND pvl.test_script_id = 67919 AND
> pv.id=pvl.property_value_id AND pv.id>0
> ORDER BY pvl.component_id, pvl.property_id, pvl.property_value_id
>
> It uses the plan:
> PLAN SORT (JOIN (PV INDEX (RDB$PRIMARY39), PVL INDEX (RDB$FOREIGN70,
> RDB$FOREIGN73, RDB$FOREIGN72)))
>
> The plan looks ok, but it takes just as long to run.
>

pv.id=pvl.property_value_id AND pv.id>0


The pv.id >0 parts is a filter condition on pv.id (that you does not
have on the first query) but renders to be useless, since all PV.ID
would be > 0, so all rows on PV should be scanned, the real filter
conditions are on this:

pvl.system_id = 8413 AND pvl.test_script_id = 67919

so PVL should be the first scanned table and then linked to PV by PV.ID (the PK)

I think FB are thinking PV to be a very small table (so put it on the outer most loop on the JOINS), but it's wrong (this is why I think you have bad statistics)


> The plan using the pv natural and the one using PV INDEX
> (RDB$PRIMARY39) - with the pv.id>0 hack - take 6 secs. to run on a
> core duo 2.4ghz, fb 2.0.3 windows superserver
> I need to make sure the query is optimized. The end result table only
> has 233 rows, and so is the case with
>
> SELECT pvl.component_id, pvl.property_id
> FROM property_value_list pvl
> WHERE pvl.system_id = 8413 AND pvl.test_script_id = 67919
>
> The query above is basically one of the two tables I want to join (the
> other, obviosly is the property_value). This query above completes
> instantly with 233 rows.
>

Because it's using PVL as the first table to be scanned and using the
appropriate indices.

> Now, I can "manually" force the join in this way, using a subquery:
>
> select * from
> (SELECT pvl.component_id, pvl.property_id, pvl.PROPERTY_VALUE_ID
> FROM property_value_list pvl
> WHERE pvl.system_id = 8413 AND pvl.test_script_id = 67919) as pvls
> inner join PROPERTY_VALUE on (PROPERTY_VALUE.id =
> pvls.property_value_id);
> This uses: PLAN JOIN (PROPERTY_VALUE NATURAL, PVLS PVL INDEX
> (RDB$FOREIGN70, RDB$FOREIGN73, RDB$FOREIGN72))
>
> This takes TWO MINUTES (note the PROPERTY_VALUE NATURAL)
>
> If, however, I use the much faster
>
> select * from
> (SELECT pvl.component_id, pvl.property_id, pvl.PROPERTY_VALUE_ID
> FROM property_value_list pvl
> WHERE pvl.system_id = 8413 AND pvl.test_script_id = 67919) as pvls
> left join PROPERTY_VALUE on (PROPERTY_VALUE.id = pvls.property_value_id)
> This uses: PLAN JOIN (PVLS PVL INDEX (RDB$FOREIGN73, RDB$FOREIGN72),
> PROPERTY_VALUE INDEX (RDB$PRIMARY39))
>
> This result table is ready INSTANTLY - and yes, it is using the
> RDB$PRIMARY39 index. =\
>

Because the LEFT JOIN is forcing PVL to be the first scanned table (as I
said above), I think that this is the ideal plan

> I don't have much experience optimizing FB queries (yet) so if
> somebody can throw some light on this behavior, it will be greatly
> appreciated.
>
>

I don't know why the optimizer are choosing to scan first PV instead of
PVL (I suspect you have out of date indices statistics).

You could try this one:

SELECT
pvl.component_id, pvl.property_id,
pv.property_value as property_value
FROM
property_value_list pvl left join
property_value pv on (pv.id=pvl.PROPERTY_VALUE_ID)
WHERE
pvl.system_id = 8413 AND
pvl.test_script_id = 67919
ORDER BY
pvl.component_id, pvl.property_id, pvl.property_value_id


This one should be lightning fast. With a plan like:

PLAN SORT (JOIN (PVL INDEX (RDB$FOREIGN73, RDB$FOREIGN72), PV(RDB$PRIMARY39)))


Double check your indices statistics.

--
Alexandre Benson Smith
Development
THOR Software e Comercial Ltda
Santo Andre - Sao Paulo - Brazil
www.thorsoftware.com.br