Subject Re: [firebird-support] How to replace NOT IN with JOINs
Author setysvar
>CREATE TABLE AAA_TABLE_A
>(ID D_INTEGER);
>
>CREATE TABLE AAA_TABLE_B
>(ID D_INTEGER, AAA_TABLE_A D_INTEGER, MY_VALUE D_INTEGER, MY_VALUE2
D_INTEGER);
>
>DELETE FROM AAA_TABLE_A;
>INSERT INTO AAA_TABLE_A (ID) VALUES (1);
>INSERT INTO AAA_TABLE_A (ID) VALUES (2);
>INSERT INTO AAA_TABLE_A (ID) VALUES (3);
>INSERT INTO AAA_TABLE_A (ID) VALUES (4);
>INSERT INTO AAA_TABLE_A (ID) VALUES (5);
>
>DELETE FROM AAA_TABLE_B;
>INSERT INTO AAA_TABLE_B (ID, AAA_TABLE_A, MY_VALUE, MY_VALUE2) VALUES
(11, 1, 333, 666);
>INSERT INTO AAA_TABLE_B (ID, AAA_TABLE_A, MY_VALUE, MY_VALUE2) VALUES
(22, 2, 444, 888);
>INSERT INTO AAA_TABLE_B (ID, AAA_TABLE_A, MY_VALUE, MY_VALUE2) VALUES
(33, 4, 333, 666);
>INSERT INTO AAA_TABLE_B (ID, AAA_TABLE_A, MY_VALUE, MY_VALUE2) VALUES
(44, 4, 444, 888);
>INSERT INTO AAA_TABLE_B (ID, AAA_TABLE_A, MY_VALUE, MY_VALUE2) VALUES
(55, 4, 222, 444);
>INSERT INTO AAA_TABLE_B (ID, AAA_TABLE_A, MY_VALUE, MY_VALUE2) VALUES
(66, 5, 111, 222);
>
>SELECT A.ID AS TABLE_A
> FROM AAA_TABLE_A A
> WHERE A.ID NOT IN (SELECT AAA_TABLE_A
> FROM AAA_TABLE_B BB
> INNER JOIN AAA_TABLE_A AA ON AA.ID=BB.AAA_TABLE_A
> WHERE BB.MY_VALUE = 444 AND BB.MY_VALUE2 = 888);
>
>SELECT A.ID AS TABLE_A, B.ID AS TABLE_B, B.MY_VALUE, B.MY_VALUE2
> FROM AAA_TABLE_A A
> LEFT JOIN AAA_TABLE_B B ON B.AAA_TABLE_A = A.ID
> WHERE NOT (B.ID IS NOT NULL AND B.MY_VALUE = 444 AND B.MY_VALUE2 = 888);
>
>I have created very simple test to check NOT IN vs. JOINs where NOT IN
>works file, but it is much slower than JOINs. Of course I didn't
>create here any index, but for few records it doesn't make sense.
>
>I need to get very often a set of record based on subselect which is
>very easy to be solved by JOINs only in case I don't need NOT IN.
>
>Does anybody have an idea who to adapt my JOIN version of SELECT where
>no record with AAA_TABLE_A.ID = 4. Records 1, 3 and 5 are correct
>only.

Thanks for providing a good problem description, including how to create
data to test against, that helped me find an error in what I originally
tried!

I think you may be more interested in knowing how to return record 3 and
5 than record 1, 3 and 5. I would probably have written:

SELECT A.ID
FROM AAA_TABLE_A A
WHERE NOT EXISTS(SELECT *
FROM AAA_TABLE_B B
JOIN AAA_TABLE_B B2 ON B.MY_VALUE = B2.MY_VALUE
AND B.MY_VALUE2 = B2.MY_VALUE2
JOIN AAA_TABLE_A A2 ON B2.AAA_TABLE_A = A2.ID
WHERE A2.ID = 4
AND A.ID = B.AAA_TABLE_A)

I tried getting the same result using only LEFT JOIN:

SELECT DISTINCT A.ID
FROM AAA_TABLE_A A
LEFT JOIN AAA_TABLE_B B ON A.ID = B.AAA_TABLE_A
LEFT JOIN AAA_TABLE_B B2 ON B.MY_VALUE = B2.MY_VALUE
AND B.MY_VALUE2 = B2.MY_VALUE2
LEFT JOIN AAA_TABLE_A A2 ON B2.AAA_TABLE_A = A2.ID
AND A2.ID = 4
WHERE A2.ID IS NULL

This failed since there are values that match B2 without matching A2.ID.
Hence, this evening I'm not able to get you the result you want with
only LEFT JOIN (maybe I could if my brain had worked better tonight, I
don't know).

You did take a shortcut by explicitly specifying 444 and 888 rather than
add another JOIN to AAA_TABLE_A and check for 4 (that's also why your
select return record 1 despite 333 and 666 also existing for record 4).
So your query can be quicker than mine, but if you hadn't taken the
shortcut, using NOT EXISTS should always be quicker or equally quick
when compared to NOT IN (sometimes it can be considerably quicker).

One error you did in your LEFT JOIN attempt, was to compare B against
MY_VALUE and MY_VALUE2 in your WHERE clause. Comparisons against values
on the RIGHT table of a LEFT JOIN normally have to go in the ON clause
to prevent the query from logically changing to an INNER JOIN (IS NULL
can go in the WHERE clause to check for non-existance, and there are
other times when having the RIGHT table in the WHERE clause makes sense,
e.g. when a match have to occur in one of several tables).

Hope this helps,
Set