Subject | LEFT JOIN and NULLs (performance problems) |
---|---|
Author | tdtappe |
Post date | 2005-08-09T16:14:19Z |
Working with FB 1.5 I noticed some strange performance problem.
So I had a closer look into it and came across some
observations/questions concerning left outer joins and null values.
If I have a SELECT like this:
SELECT
A.PrimaryKeyCol, B.PrimaryKeyCol, C.PrimaryKeyCol
FROM
Table1 A
LEFT OUTER JOIN Table2 B ON A.ForeignKeyColToTable2 = B.PrimaryKeyCol
LEFT OUTER JOIN Table3 C ON B.ForeignKeyColToTable4 =
C.ForeignKeyColToTable4
(Table4 is not directly used or needed in this query)
Table2.ForeignKeyColToTable4 is NOT NULL
Table3.ForeignKeyColToTable4 is nullable and has lots of records with
null "value" for this column.
If the first join does not find any record in Table2 the value for
B.PrimaryKeyCol is NULL.
And also the value for the next join to Table3 is NULL. So the third
join does not "find" any records
for Table3. Whether there are records in Table3 with NULL values for
ForeignKeyColToTable4 or not.
So far everything seems to be correct. The resultset is as expected.
BUT: If I now look at the reads needed to execute and fetch this
select command I see that Table3 is scanned for records with null
values. And because there are a lot of them performance gets really bad
as soon as the join to Table2 doesn't find any record.
In my real world case instead of about 70 index reads Firebird needs
almost 70.000 indexed reads.
And it will get worse because more "null records" will populate Table3.
So what do you think about it? Should Firebird do this? Am I wrong to
think that Firebird shouldn't?
And of course: What can I do (as a workaround?) to speed up things?
--Heiko
So I had a closer look into it and came across some
observations/questions concerning left outer joins and null values.
If I have a SELECT like this:
SELECT
A.PrimaryKeyCol, B.PrimaryKeyCol, C.PrimaryKeyCol
FROM
Table1 A
LEFT OUTER JOIN Table2 B ON A.ForeignKeyColToTable2 = B.PrimaryKeyCol
LEFT OUTER JOIN Table3 C ON B.ForeignKeyColToTable4 =
C.ForeignKeyColToTable4
(Table4 is not directly used or needed in this query)
Table2.ForeignKeyColToTable4 is NOT NULL
Table3.ForeignKeyColToTable4 is nullable and has lots of records with
null "value" for this column.
If the first join does not find any record in Table2 the value for
B.PrimaryKeyCol is NULL.
And also the value for the next join to Table3 is NULL. So the third
join does not "find" any records
for Table3. Whether there are records in Table3 with NULL values for
ForeignKeyColToTable4 or not.
So far everything seems to be correct. The resultset is as expected.
BUT: If I now look at the reads needed to execute and fetch this
select command I see that Table3 is scanned for records with null
values. And because there are a lot of them performance gets really bad
as soon as the join to Table2 doesn't find any record.
In my real world case instead of about 70 index reads Firebird needs
almost 70.000 indexed reads.
And it will get worse because more "null records" will populate Table3.
So what do you think about it? Should Firebird do this? Am I wrong to
think that Firebird shouldn't?
And of course: What can I do (as a workaround?) to speed up things?
--Heiko