Subject | FULL JOIN = lousy performance? |
---|---|
Author | Svein Erling Tysvær |
Post date | 2014-07-28T08:56:59Z |
I tried to see how values were distributed across two tables, and tried something similar to (slightly simplified):
SELECT COALESCE(A.F1, B.F1), COALESCE(A.F2, B.F2),
CASE
WHEN A.PK IS NULL THEN 'B only'
WHEN B.PK IS NULL THEN 'A only'
ELSE 'Both'
END InTable, COUNT(*)
FROM A
FULL JOIN B ON A.F1 = B.F1 AND A.F2 = B.F2
GROUP BY 1, 2, 3
This took longer than I expected, so I stopped the query (I waited for about 15 minutes before stopping it), and rewrote to:
WITH T(F1, F2) AS
(SELECT F1, F2
FROM A
UNION
SELECT F1, F2
FROM B)
SELECT T.F1, T.F2,
CASE
WHEN A.PK IS NULL THEN 'B only'
WHEN B.PK IS NULL THEN 'A only'
ELSE 'Both'
END InTable, COUNT(*)
FROM T
LEFT JOIN A ON T.F1 = A.F1 AND T.F2 = A.F2
LEFT JOIN B ON T.F1 = B.F1 AND T.F2 = B.F2
GROUP BY 1, 2, 3
This query executed fine (don't remember whether it used 30 or 90 seconds to execute).
A contains the about 8000 allowed combinations of B, whereas B contains 1.6 millions of rows. Think of B.F1, B.F2 as being a relaxed FOREIGN KEY - no new entries that aren't in A should be allowed, but historical content should not be deleted, so we don't want to actually implement it as a FK (moreover, in the real database A.F2 contains five digits, whereas B.F2 is six digits with no restrictions on the last digit).
Naturally, the first query use NATURAL for A and B, whereas the second query use natural for both tables in the CTE, but indexes for the LEFT JOIN.
The server where I ran the queries is a Firebird 2.5.1 SuperServer installation.
I hardly ever use FULL JOIN and wonder why the first query should take so much longer than the second?
Set
-INNER JOIN expert, fairly knowledgeable regarding LEFT/RIGHT JOIN, but a newbie regarding FULL JOIN
SELECT COALESCE(A.F1, B.F1), COALESCE(A.F2, B.F2),
CASE
WHEN A.PK IS NULL THEN 'B only'
WHEN B.PK IS NULL THEN 'A only'
ELSE 'Both'
END InTable, COUNT(*)
FROM A
FULL JOIN B ON A.F1 = B.F1 AND A.F2 = B.F2
GROUP BY 1, 2, 3
This took longer than I expected, so I stopped the query (I waited for about 15 minutes before stopping it), and rewrote to:
WITH T(F1, F2) AS
(SELECT F1, F2
FROM A
UNION
SELECT F1, F2
FROM B)
SELECT T.F1, T.F2,
CASE
WHEN A.PK IS NULL THEN 'B only'
WHEN B.PK IS NULL THEN 'A only'
ELSE 'Both'
END InTable, COUNT(*)
FROM T
LEFT JOIN A ON T.F1 = A.F1 AND T.F2 = A.F2
LEFT JOIN B ON T.F1 = B.F1 AND T.F2 = B.F2
GROUP BY 1, 2, 3
This query executed fine (don't remember whether it used 30 or 90 seconds to execute).
A contains the about 8000 allowed combinations of B, whereas B contains 1.6 millions of rows. Think of B.F1, B.F2 as being a relaxed FOREIGN KEY - no new entries that aren't in A should be allowed, but historical content should not be deleted, so we don't want to actually implement it as a FK (moreover, in the real database A.F2 contains five digits, whereas B.F2 is six digits with no restrictions on the last digit).
Naturally, the first query use NATURAL for A and B, whereas the second query use natural for both tables in the CTE, but indexes for the LEFT JOIN.
The server where I ran the queries is a Firebird 2.5.1 SuperServer installation.
I hardly ever use FULL JOIN and wonder why the first query should take so much longer than the second?
Set
-INNER JOIN expert, fairly knowledgeable regarding LEFT/RIGHT JOIN, but a newbie regarding FULL JOIN