A query can only have one outer join that produces more than one match

(written by lawrence krubner, however indented passages are often quotes). You can contact lawrence at: lawrence@krubner.com, or follow me on Twitter.


Whenever you have one central table, and you want to outer join it to two separate tables, you are in danger. To see why, think through what happens step-by-step as these tables get joined. All the joins happen before any grouping, so first we make a bunch of rows combining restaurants and inspections. Now for each of those rows, we join to the employees table. That means if we have a restaurant with two inspections, we’ll create rows for all its employees for both inspections. That means when we handle the grouping, our aggregate functions will compute wrong results. For the COUNT, we can solve the problem with COUNT(DISTINCT e.id), but it’s not so easy to solve for the AVG.

My rule is that a query can only have one outer join that produces more than one match. If two outer joins both produce multiple matches, then you’ll have trouble.

The only way I know to generate reports like this is to run a subquery for each outer-joined table, so that you can ensure one row per restaurant in the subquery results:

SELECT r.name, x.avg_score, y.employees
FROM restaurants r
LEFT OUTER JOIN (SELECT restaurant_id, AVG(score) avg_score
FROM inspections
GROUP BY restaurant_id) x
ON r.id = x.restaurant_id
LEFT OUTER JOIN (SELECT restaurant_id, COUNT(id) employees
FROM employees
GROUP BY restaurant_id) y
ON r.id = y.restaurant_id
That approach ensures that your aggregate functions will compute correct results, unpolluted by duplicate rows caused by outer joins to the sibling table(s).

Post external references

  1. 1