Subject Terrible left join performance
Author Richard Wesley
Hi All -

I have just been given a case where Firebird falls off a cliff
compared to Excel (using Jet). The culprit seems to be a left join.

SELECT ("vw_ExportForTableau"."Main Type") AS "none:Main Type:nk",
("vw_ExportForTableau"."Offered_dt") AS "none:Offered_dt:qk",
(AVG(CAST("vw_ExportForTableau"."CapRate" AS DOUBLE PRECISION)))
AS "avg:CapRate:qk"
FROM "vw_ExportForTableau$" "vw_ExportForTableau"
LEFT JOIN "zip_codes$" "zip_codes" ON
("vw_ExportForTableau"."Zip_cd" = "zip_codes"."Zip")
WHERE ((NOT (("vw_ExportForTableau"."Main Type") IN ( 'Dev Site',
'Mixed')) ) AND ((EXTRACT(YEAR FROM
"vw_ExportForTableau"."Offered_dt")) = 2006) )
GROUP BY 1,
2;

<PLAN>
PLAN SORT (JOIN (vw_ExportForTableau NATURAL, zip_codes NATURAL))
</PLAN>
[Time] Running the query took 272.2820 sec.
[Time] Getting the records took 0.0450 sec.
[Count] Query returned 1379 records.

By comparison, Excel executes the same query on the same data in half
a second:

SELECT ([_TableauRelation_].[Main Type]) AS [none:Main Type:nk],
([_TableauRelation_].[Offered_dt]) AS [none:Offered_dt:qk],
(AVG([_TableauRelation_].[CapRate])) AS [avg:CapRate:qk]
FROM (
SELECT [vw_ExportForTableau].[PercentOfAsk] AS [PercentOfAsk],
<long list of fields because of Jet's join rules...>
FROM [vw_ExportForTableau$] [vw_ExportForTableau]
LEFT JOIN [zip_codes$] [zip_codes] ON ([vw_ExportForTableau].
[Zip_cd] = [zip_codes].[Zip])
) [_TableauRelation_]
WHERE ((NOT (([_TableauRelation_].[Main Type]) IN ('Dev Site',
'Mixed')) ) AND ((DATEPART('yyyy',[_TableauRelation_].[Offered_dt]))
= 2006) )
GROUP BY ([_TableauRelation_].[Main Type]),
([_TableauRelation_].[Offered_dt]);
[Time] Running the query took 0.4285 sec.
[Time] Getting the records took 0.0164 sec.
[Count] Query returned 1379 records.

To make it worse, the Firebird tables are fully indexed.

I can't give the full schema as it is customer data, but one table is
a generic zip code table and looks like this:

CREATE TABLE "zip_codes$" (
"City" VARCHAR(26) CHARACTER SET UTF8 COLLATE UNICODE,
"County" VARCHAR(25) CHARACTER SET UTF8 COLLATE UNICODE,
"Latitude" FLOAT(53),
"Longitude" FLOAT(53),
"State" VARCHAR(2) CHARACTER SET UTF8 COLLATE UNICODE,
"Zip Class" VARCHAR(11) CHARACTER SET UTF8 COLLATE UNICODE,
"Zip" FLOAT(53)
);

INSERT INTO "zip_codes$"
("City", "County", "Latitude", "Longitude", "State", "Zip
Class", "Zip")
VALUES(?, ?, ?, ?, ?, ?, ?);
Inserted 42741 rows in 6.426 seconds

The other table is a very wide and cutting the fields down to those
in the query:

CREATE TABLE "vw_ExportForTableau$" (
"CapRate" FLOAT(53),
"Main Type" VARCHAR(10) CHARACTER SET UTF8 COLLATE UNICODE,
"Offered_dt" TIMESTAMP,
"Zip_cd" FLOAT(53)
);

INSERT INTO "vw_ExportForTableau$"
("CapRate", "Main Type", "Offered_dt", "Zip_cd")
VALUES(?, ?, ?, ?)
Inserted 36630 rows in 37.560 seconds

CREATE INDEX "_tidx_128_31a" ON "vw_ExportForTableau$" ("Offered_dt");
CREATE INDEX "_tidx_128_22a" ON "vw_ExportForTableau$" ("Main Type");

Any ideas on why this is so much slower? Could it be the float field
used for the join? No better in 2.1b2.

TIA,
________________________________________________________
Richard Wesley Senior Software Developer Tableau
Software
Visit: http://www.trytableau.com/now.html