Does filter placement matter in Outer Joins?
Left or Right Outer Joins by definition return all rows from either the left or the right table/view and any matching rows from the opposite table/view and NULL’s for any rows that don’t match.
For example a Left Join (Left Outer Join) would contain every row from the left table and any matching rows from the right table and a NULL for all none matching rows.
The example below creates 2 temporary tables and populates with some test data:-
CREATE TABLE #tmp1 ( id INT ) CREATE TABLE #tmp2 ( id INT ) INSERT INTO #tmp1 VALUES (1),(2),(3),(4),(5) INSERT INTO #tmp2 VALUES (1),(2)
If we join these 2 tables with a Left Join
SELECT tmp1.id AS LeftID , tmp2.id AS RightID FROM #tmp1 tmp1 LEFT JOIN #tmp2 tmp2 ON tmp1.id = tmp2.id
We get our expected result set of returning every row from the left table and a matching row from the right table with NULL’s for all none matching rows.
If we introduce a Where clause we can reduce the rows returned.
SELECT tmp1.id AS LeftID , tmp2.id AS RightID FROM #tmp1 tmp1 LEFT JOIN #tmp2 tmp2 ON tmp1.id = tmp2.id WHERE tmp1.id IN ( 2, 3 )
We get our expected result set, ie the result above is filtered down to only return id 2 or 3 from the left table and any corresponding rows from the join.
What happens if we move the Where clause in to the joins On clause
SELECT tmp1.id AS LeftID , tmp2.id AS RightID FROM #tmp1 tmp1 LEFT JOIN #tmp2 tmp2 ON tmp1.id = tmp2.id AND tmp1.id IN ( 2, 3 )
We get this result set
Hmm that’s not what I was expecting! Did you expect that?
The order the query engine processes the join is the important factor here, in all joins there is either a 1, 2 or 3 step process, in this instance (Outer Join) there is a 3 step process, step 1 creates a Cartesian result from the 2 tables (think Cross Join) , step 2 applies the On filter (think Inner Join) then step 3 adds all records that don’t match from either the left or right tables (Outer Join), after all this then the Where clause is applied.
In the first example the left join creates a result set (including NULLs for none matching rows) then applies the filter in the Where clause that reduces the result set that is output.
The second example the filter is applied in the join (step 2), so the join is only matching records 2 or 3 then every none matching record from the left table is added to the result set with a NULL from the opposite table.
Do you know what your left joins are doing?
Incidentally this doesn’t happen with Inner Joins as step 3 is not applied.