Subject | Shocked by the optimizer (Arno, where art thou?) |
---|---|
Author | Svein Erling Tysvær |
Post date | 2005-12-14T08:53:44Z |
I normally avoid using outer joins, but this time I had a problem that
I thought was best solved using a LEFT JOIN. Surprised by the
suggested plan, I decided to research a little - and got even more
surprised. Here's the simple SQL statement I ended up with to make the
optimizer look a fool:
SELECT * /*which fields are irrelevant, but I specified fields from
all tables*/
FROM TableA A
JOIN TableB B on A.PK = B.Apk
LEFT JOIN TableC on B.PK = C.Bpk
WHERE B.PK = 1
A.PK and B.PK are primary keys and there are very selective indexes
for B.Apk and C.Bpk (normally each value is repeated between 1 and 10
times and each table is over 1 million records in total).
To me, the obvious plan (which is suggested if I use a plain JOIN and
not a LEFT JOIN) is
PLAN JOIN (JOIN (B INDEX(PK_INDEXB), A INDEX(PK_INDEXA)), C INDEX
(B_PK_INDEX))
but the optimizer suggests
PLAN JOIN(JOIN (A NATURAL, B INDEX(PK_INDEXB)), C INDEX (B_PK_INDEX))
In plain English, in my particular case that means the optimizer
thinks it is better to walk sequentially through a table with 1
million records rather than to look up one particular record in
another table that is marginally bigger.
I tested this on Firebird 1.5.2 and one of the release candidates for
1.5.3. I have not tested it on Firebird 2.0. Needless to say, when
hardcoding the plan, the obvious plan executes considerably faster
than the plan the optimizer suggests.
I know LEFT JOINs make it harder for the optimizer, but I never
thought it was this easy to confuse it. Why does it behave this way?
Set
I thought was best solved using a LEFT JOIN. Surprised by the
suggested plan, I decided to research a little - and got even more
surprised. Here's the simple SQL statement I ended up with to make the
optimizer look a fool:
SELECT * /*which fields are irrelevant, but I specified fields from
all tables*/
FROM TableA A
JOIN TableB B on A.PK = B.Apk
LEFT JOIN TableC on B.PK = C.Bpk
WHERE B.PK = 1
A.PK and B.PK are primary keys and there are very selective indexes
for B.Apk and C.Bpk (normally each value is repeated between 1 and 10
times and each table is over 1 million records in total).
To me, the obvious plan (which is suggested if I use a plain JOIN and
not a LEFT JOIN) is
PLAN JOIN (JOIN (B INDEX(PK_INDEXB), A INDEX(PK_INDEXA)), C INDEX
(B_PK_INDEX))
but the optimizer suggests
PLAN JOIN(JOIN (A NATURAL, B INDEX(PK_INDEXB)), C INDEX (B_PK_INDEX))
In plain English, in my particular case that means the optimizer
thinks it is better to walk sequentially through a table with 1
million records rather than to look up one particular record in
another table that is marginally bigger.
I tested this on Firebird 1.5.2 and one of the release candidates for
1.5.3. I have not tested it on Firebird 2.0. Needless to say, when
hardcoding the plan, the obvious plan executes considerably faster
than the plan the optimizer suggests.
I know LEFT JOINs make it harder for the optimizer, but I never
thought it was this easy to confuse it. Why does it behave this way?
Set