Introduction
The outer join is one of the most frequently misunderstood operators in relational query processing.
Unlike the inner join, which discards tuples that fail the join predicate, an outer join preserves tuples from one or both input relations by padding unmatched attributes with NULL values.
This seemingly simple extension has deep consequences for query optimization, logical equivalences, and correct query evaluation.
Understanding how NULL-extended tuples arise, propagate, and interact with predicates is essential for anyone building or reasoning about query engines.
The SQL standard defines three variants: LEFT OUTER JOIN, RIGHT OUTER JOIN, and FULL OUTER JOIN.
Each variant specifies which input relation is "preserved," meaning its tuples will appear in the output regardless of whether they find a match in the other relation.
The mechanism that makes this possible is NULL extension: when a tuple from the preserved side has no matching partner, the result tuple fills in NULL for every attribute of the non-preserved side.
Formal Semantics
Let R and S be two relations, and let θ be a join predicate over attributes of R and S.
Define the standard inner join as:
R ⋈_θ S = { r ∘ s | r ∈ R, s ∈ S, θ(r, s) }
where r ∘ s denotes tuple concatenation.
The left outer join is then defined as:
R ⟕_θ S = (R ⋈_θ S) ∪ { r ∘ NULL_S | r ∈ R, ¬∃s ∈ S : θ(r, s) }
Here NULL_S is a tuple of NULLs with the same schema as S.
The second component of the union captures precisely those R-tuples that found no match.
These are the NULL-extended tuples.
Symmetrically, the right outer join preserves all tuples from S:
R ⟖_θ S = (R ⋈_θ S) ∪ { NULL_R ∘ s | s ∈ S, ¬∃r ∈ R : θ(r, s) }
The full outer join preserves tuples from both sides:
R ⟗_θ S = (R ⋈_θ S) ∪ { r ∘ NULL_S | r ∈ R, ¬∃s ∈ S : θ(r, s) }
∪ { NULL_R ∘ s | s ∈ S, ¬∃r ∈ R : θ(r, s) }
A critical property: the set of NULL-extended tuples and the set of inner-join tuples are always disjoint in the output.
Every output tuple is either a genuine matched pair or a padded singleton.
This partitioning is important for understanding predicate behavior.
NULL-Extended Tuples and Predicate Interaction
NULL values introduced by outer joins behave according to three-valued logic (3VL).
Any comparison involving NULL evaluates to UNKNOWN, and standard SQL WHERE clauses filter out tuples where the predicate is not TRUE.
This creates a critical interaction: applying a filter on a NULL-extended attribute effectively converts an outer join back into an inner join.
Consider:
SELECT *
FROM R LEFT JOIN S ON R.a = S.b
WHERE S.c > 10
The WHERE clause references S.c.
For any NULL-extended tuple (where S's attributes are all NULL), the predicate S.c > 10 evaluates to UNKNOWN, so the tuple is discarded.
The result is identical to an inner join.
This phenomenon is called NULL rejection, and the predicate is called a null-rejecting (or strong) predicate with respect to the NULL-extended side.
Formally, a predicate p is null-rejecting on a set of attributes A if, whenever every attribute in A is NULL, p evaluates to FALSE or UNKNOWN.
Recognizing null-rejecting predicates is one of the most important optimizations in outer join processing, because it allows the optimizer to simplify outer joins to inner joins, which have far more reordering freedom.
Associativity and Commutativity Challenges
Inner joins are both commutative and associative, which gives the optimizer freedom to reorder joins in any sequence.
Outer joins are not, in general, commutative or associative.
This severely restricts the search space for join enumeration.
Consider left outer join associativity.
The expression (R ⟕ S) ⟕ T is not, in general, equivalent to R ⟕ (S ⟕ T).
The reason is that the NULL extension in the first expression pads S-attributes with NULL when an R-tuple has no match.
If those NULLs then fail the join with T, T-attributes are also padded with NULL.
In the second expression, the inner left join S ⟕ T happens first, and R then joins with the already-padded result, producing a different set of NULL-extended tuples.
Galindo-Legaria and Rosenthal (1997) developed a comprehensive framework for identifying legal reorderings of outer joins.
Their approach introduces the concept of an eligibility list and conflict-based rules, tracking which relations are preserved and which are null-extended, to determine when two join operators can be swapped without changing the result.
The key insight is that reordering becomes legal when null-rejecting predicates are present.
If a predicate downstream of an outer join rejects NULLs on the padded side, the outer join effectively becomes an inner join for the purposes of reordering.
Walkthrough
Below is a step-by-step walkthrough of evaluating a left outer join with a subsequent filter, showing how NULL-extended tuples are generated and then eliminated.
Input relations:
R: S:
| a | x | | b | c |
|---|---| |---|---|
| 1 | p | | 1 | 20|
| 2 | q | | 3 | 5 |
| 3 | r |
Step 1: Compute the inner join on R.a = S.b
Matched tuples:
| a | x | b | c |
|---|---|----|----|
| 1 | p | 1 | 20 |
| 3 | r | 3 | 5 |
Tuple (2, q) from R has no match in S.
Step 2: Generate NULL-extended tuples for unmatched R-tuples
| a | x | b | c |
|---|---|------|------|
| 2 | q | NULL | NULL |
Step 3: Union matched and NULL-extended tuples
LEFT JOIN result:
| a | x | b | c |
|---|---|------|------|
| 1 | p | 1 | 20 |
| 3 | r | 3 | 5 |
| 2 | q | NULL | NULL |
Step 4: Apply WHERE S.c > 10
Evaluate predicate on each tuple:
- (1, p, 1, 20): 20 > 10 is TRUE. Keep.
- (3, r, 3, 5): 5 > 10 is FALSE. Discard.
- (2, q, NULL, NULL): NULL > 10 is UNKNOWN. Discard.
Final result:
| a | x | b | c |
|---|---|---|----|
| 1 | p | 1 | 20 |
Observation: The NULL-extended tuple was eliminated by the filter.
The predicate S.c > 10 is null-rejecting on S's attributes.
An optimizer can therefore rewrite this query as an inner join with both the join condition and the filter applied, potentially enabling more efficient join orderings.
Optimization Implications
Recognizing and exploiting null-rejecting predicates is a cornerstone of outer join optimization.
The simplification from outer join to inner join opens the door to standard join reordering techniques, including dynamic programming over join graphs that assume commutativity and associativity.
Beyond simplification, outer joins interact with other operators in subtle ways:
Predicate pushdown. A predicate can be pushed below an outer join if it references only the preserved side.
If it references the null-extended side and is null-rejecting, the outer join can first be converted to an inner join, and then the predicate can be pushed.
Pushing a non-null-rejecting predicate on the null-extended side past an outer join is incorrect, because it would filter tuples before NULL extension has a chance to produce them.
Group-by and aggregation. Aggregation over outer join results must account for NULL-extended tuples.
COUNT(*) includes them, COUNT(column) on a null-extended attribute does not.
SUM, AVG, and similar aggregates skip NULLs.
Incorrect placement of aggregation relative to outer joins can produce wrong results, a common source of bugs in hand-written and generated SQL.
View merging. When a view contains an outer join, merging it into an outer query requires careful analysis of whether predicates from the outer query are null-rejecting with respect to the view's NULL-extended attributes.
Incorrect merging can change outer joins to inner joins silently, or, worse, preserve the outer join when it should have been converted.
Key Points
- Outer joins preserve tuples from one or both relations by padding unmatched attributes with NULL values, producing NULL-extended tuples that are disjoint from the inner join result.
- NULL-extended tuples follow three-valued logic: any predicate comparing a NULL attribute evaluates to UNKNOWN and is filtered by WHERE clauses.
- A null-rejecting predicate on the NULL-extended side of an outer join effectively converts it to an inner join, enabling standard join reordering.
- Outer joins are neither commutative nor associative in general, which severely constrains the optimizer's join enumeration search space.
- Predicate pushdown past an outer join is only safe for predicates on the preserved side, or after outer-to-inner conversion via null rejection.
- Aggregation functions interact differently with NULL-extended tuples: COUNT(*) includes them, while COUNT(column), and most aggregates over null-extended columns do not.
- Galindo-Legaria and Rosenthal's conflict-based framework provides the theoretical foundation for determining legal outer join reorderings in modern query optimizers.
References
Galindo-Legaria, C. and Rosenthal, A. "Outerjoin Simplification and Reordering for Query Optimization." ACM Transactions on Database Systems, 22(1), 1997, pp. 43-73.
Codd, E.F. "Extending the Database Relational Model to Capture More Meaning." ACM Transactions on Database Systems, 4(4), 1979, pp. 397-434.
Garcia-Molina, H., Ullman, J.D., and Widom, J. "Database Systems: The Complete Book." Prentice Hall, 2nd edition, 2008.
Date, C.J. and Darwen, H. "A Guide to the SQL Standard." Addison-Wesley, 4th edition, 1997.
Rao, J., Lindsay, B., Lohman, G., Pirahesh, H., and Simmen, D. "Using EELs: A Practical Approach to Outerjoin and Antijoin Reordering." Proceedings of the 17th International Conference on Data Engineering (ICDE), 2001, pp. 585-594.