Introduction
SQL's NULL is one of the most misunderstood constructs in relational databases.
Unlike every other value in a database, NULL does not represent a known datum.
It represents the absence of a value, and this single design decision forces the entire logical foundation of SQL away from classical Boolean logic and into three-valued logic (3VL).
The consequences are pervasive: they affect query evaluation, indexing, aggregation, constraint checking, and join behavior.
Engineers who treat NULL as "just another value" routinely write queries that silently return incorrect results.
This article covers the formal semantics of NULL in SQL, the mechanics of three-valued logic, and the practical implications for query evaluation in real database systems.
The Origins of NULL
E.F.
Codd introduced the concept of missing information in his relational model.
In his later work, he actually proposed two kinds of nulls: one for "applicable but missing" (the value exists but is unknown) and one for "inapplicable" (the value does not apply to this entity).
SQL adopted a simpler design with a single NULL marker that conflates both meanings.
The SQL standard (ISO/IEC 9075) defines NULL not as a value but as a marker indicating the absence of a value.
This distinction matters. NULL is not zero, not an empty string, and not a special sentinel.
It belongs to no data type.
Every data type in SQL is extended with the possibility of being NULL, unless a NOT NULL constraint prohibits it.
Three-Valued Logic
Classical Boolean logic has two truth values: TRUE and FALSE.
When NULL enters a comparison, the result is neither TRUE nor FALSE but UNKNOWN.
This third truth value gives rise to three-valued logic.
SQL's WHERE clause filters rows for which the predicate evaluates to TRUE.
Rows producing UNKNOWN are excluded, just like rows producing FALSE.
Truth Tables
The standard logical connectives are extended as follows.
AND:
| A | B | A AND B |
|---|---|---|
| TRUE | TRUE | TRUE |
| TRUE | FALSE | FALSE |
| TRUE | UNKNOWN | UNKNOWN |
| FALSE | TRUE | FALSE |
| FALSE | FALSE | FALSE |
| FALSE | UNKNOWN | FALSE |
| UNKNOWN | TRUE | UNKNOWN |
| UNKNOWN | FALSE | FALSE |
| UNKNOWN | UNKNOWN | UNKNOWN |
OR:
| A | B | A OR B |
|---|---|---|
| TRUE | TRUE | TRUE |
| TRUE | FALSE | TRUE |
| TRUE | UNKNOWN | TRUE |
| FALSE | TRUE | TRUE |
| FALSE | FALSE | FALSE |
| FALSE | UNKNOWN | UNKNOWN |
| UNKNOWN | TRUE | TRUE |
| UNKNOWN | FALSE | UNKNOWN |
| UNKNOWN | UNKNOWN | UNKNOWN |
NOT:
| A | NOT A |
|---|---|
| TRUE | FALSE |
| FALSE | TRUE |
| UNKNOWN | UNKNOWN |
The key intuition: FALSE AND UNKNOWN is FALSE (because regardless of the unknown value, the conjunction cannot be true). TRUE OR UNKNOWN is TRUE (because regardless of the unknown value, the disjunction is already satisfied).
In all other cases involving UNKNOWN, the result is UNKNOWN.
Walkthrough
Consider a table employees with columns name, department, and salary, where some rows have NULL in the salary column.
Query:
SELECT name FROM employees WHERE salary > 50000;
Step-by-step evaluation for a row where salary is NULL:
- Evaluate the expression
salary > 50000. - Substitute:
NULL > 50000. - Any comparison with
NULLyieldsUNKNOWN. - The
WHEREclause requiresTRUEto include the row. UNKNOWNis notTRUE, so the row is excluded.
Query:
SELECT name FROM employees WHERE salary > 50000 OR department = 'Engineering';
Evaluation for a row where salary is NULL and department is 'Engineering':
NULL > 50000evaluates toUNKNOWN.'Engineering' = 'Engineering'evaluates toTRUE.UNKNOWN OR TRUEevaluates toTRUE(see OR truth table).- The row is included.
Evaluation for a row where salary is NULL and department is NULL:
NULL > 50000evaluates toUNKNOWN.NULL = 'Engineering'evaluates toUNKNOWN.UNKNOWN OR UNKNOWNevaluates toUNKNOWN.- The row is excluded.
The NOT IN trap:
SELECT name FROM employees WHERE department NOT IN ('Sales', NULL);
This query returns zero rows, regardless of data.
Here is why:
NOT IN (...)expands todepartment <> 'Sales' AND department <> NULL.department <> NULLevaluates toUNKNOWNfor every row, because any comparison againstNULLyieldsUNKNOWN.anything AND UNKNOWNis eitherFALSEorUNKNOWN, neverTRUE.- No row passes the filter.
This trap applies equally when the list comes from a subquery that returns any NULL value — not just when a literal NULL appears in a static list.
It is one of the most common NULL-related bugs in production SQL.
Practical Consequences
Equality and the IS Operator
NULL = NULL evaluates to UNKNOWN, not TRUE.
Two nulls are not considered equal.
This is logically consistent (two unknown values have no reason to be the same), but it surprises engineers accustomed to programming languages where null == null is true.
To test for NULL, SQL provides the IS NULL and IS NOT NULL predicates, which always return TRUE or FALSE, never UNKNOWN.
Handling NULLs Explicitly
SQL provides several functions for working with nulls explicitly. COALESCE(expr1, expr2, ...) returns the first non-null argument, making it useful for substituting default values. NULLIF(expr1, expr2) returns NULL if the two arguments are equal, and expr1 otherwise — the inverse of COALESCE.
These constructs let you convert between null and non-null representations at query time without changing stored data.
Aggregation
Aggregate functions in SQL skip NULL values. SUM, AVG, MIN, MAX, and COUNT(column) all ignore nulls. COUNT(*) counts rows regardless of column values.
This means AVG(salary) computes the average of non-null salaries only, which may or may not match the analyst's intent.
If all values in a group are NULL, aggregate functions such as SUM, AVG, MIN, and MAX return NULL; COUNT(column) returns 0 because it counts non-null occurrences and there are none.
DISTINCT and GROUP BY
Despite NULL <> NULL yielding UNKNOWN in comparisons, DISTINCT and GROUP BY treat all nulls as belonging to the same group.
This is a deliberate departure from strict 3VL semantics for pragmatic reasons: without it, every NULL would form its own group, making grouping nearly useless for nullable columns.
Joins
In equi-joins, NULL values never match each other because the equality comparison yields UNKNOWN.
If you need to match nulls, you must use IS NOT DISTINCT FROM (SQL:2003) or database-specific equivalents such as <=> in MySQL or IS NOT DISTINCT FROM in PostgreSQL.
Indexes and Uniqueness
A UNIQUE constraint in SQL permits multiple NULL values in most implementations (PostgreSQL, SQLite, MySQL with InnoDB).
The rationale follows from 3VL: since NULL = NULL is UNKNOWN, two nulls are not provably equal, so no uniqueness violation can be established.
Oracle's behavior is nuanced: its standard B-tree indexes do not store entries where all indexed columns are NULL, so a single-column B-tree index effectively cannot enforce uniqueness across multiple NULLs.
In practice, Oracle's UNIQUE constraint therefore permits multiple NULL values — but through an indexing omission rather than an explicit semantic choice.
This means IS NULL predicates on a single-column B-tree index in Oracle may not use the index at all.
The standard workaround is a function-based index (e.g., indexing a CASE expression that maps NULL to a sentinel) or using a partial index in systems that support them.
SQL Server deviates from the SQL standard in the opposite direction: a unique index on a nullable column permits only a single NULL value.
SQL Server treats NULL as a distinct, known value for uniqueness purposes, meaning a second NULL is considered a duplicate of the first.
This behavior is a common source of portability bugs when migrating between database systems.
CHECK Constraints
A CHECK constraint is satisfied if the predicate evaluates to TRUE or UNKNOWN.
This is the opposite of WHERE clause semantics.
The reasoning: rejecting a row on the basis of incomplete information would be overly strict.
This means CHECK (salary > 0) permits NULL salaries.
If you want to reject nulls, you need an explicit NOT NULL constraint alongside the CHECK.
Implications for Query Optimizers
Three-valued logic complicates query optimization.
Many algebraic rewrites that are valid in two-valued logic become invalid in 3VL.
For example, the law of excluded middle (P OR NOT P = TRUE) does not hold in 3VL.
If salary is NULL, then salary > 50000 OR salary <= 50000 evaluates to UNKNOWN OR UNKNOWN, which is UNKNOWN, not TRUE.
Optimizers must account for this and cannot freely apply such simplifications to nullable expressions.
Double negation (NOT (NOT P) = P) does hold in 3VL, because NOT maps each of the three truth values consistently: NOT TRUE = FALSE, NOT FALSE = TRUE, and NOT UNKNOWN = UNKNOWN.
Applying NOT twice returns any truth value to itself, so this rewrite remains safe for optimizers.
Predicate pushdown through outer joins is another area where NULL semantics constrain optimization.
An outer join may introduce nulls for non-matching rows, and a predicate applied after the join may behave differently than the same predicate applied before the join, precisely because of the UNKNOWN truth value.
Alternatives and Critiques
C.J.
Date has argued extensively against nulls, proposing that missing information should be handled through explicit default values or through decomposition (separating optional attributes into their own relations).
His position is that three-valued logic introduces too much complexity and too many traps.
Others have proposed four-valued or even six-valued logics to distinguish different kinds of missing information.
These proposals have not gained traction in practical systems due to their complexity.
Some modern systems take a pragmatic middle ground.
Apache Spark SQL follows SQL semantics for nulls but provides functions like nullif, coalesce, and isnan to handle edge cases explicitly.
DuckDB and other analytical engines adhere to standard 3VL but invest in clear documentation to reduce confusion.
Key Points
NULLis not a value but a marker for missing information, and any comparison involvingNULLyieldsUNKNOWN, notTRUEorFALSE.- SQL operates under three-valued logic, where
WHEREclauses include only rows evaluating toTRUE, silently excludingUNKNOWNresults. NULL = NULLisUNKNOWN, which is why SQL providesIS NULL,IS NOT NULL, andIS NOT DISTINCT FROMfor null-safe comparisons.NOT INwith aNULLelement in the list — whether a literal or a value returned by a subquery — is a notorious bug source, as the entire predicate can never evaluate toTRUE.DISTINCTandGROUP BYpragmatically group all nulls together, deviating from strict 3VL semantics.CHECKconstraints acceptUNKNOWN(unlikeWHERE), meaning they do not reject null values without an explicitNOT NULLconstraint.- Algebraic identities valid in two-valued logic (such as the law of excluded middle) may not hold in 3VL, constraining query optimizer rewrites; double negation is a safe rewrite but excluded-middle elimination is not.
- SQL Server permits only one NULL in a unique index on a nullable column, treating NULL as a known distinct value for uniqueness purposes and diverging from the SQL standard and from the behavior of PostgreSQL, MySQL, and SQLite.
- Oracle permits multiple NULLs under a UNIQUE constraint because its B-tree indexes do not store all-NULL entries, an indexing artifact rather than an explicit semantic policy.
References
Codd, E.F. "A Relational Model of Data for Large Shared Data Banks." Communications of the ACM, Vol. 13, No. 6, June 1970, pp. 377–387.
Codd, E.F. "Missing Information (Applicable and Inapplicable) in Relational Databases." ACM SIGMOD Record, Vol. 15, No. 4, December 1986, pp. 53–78.
Date, C.J. Database in Depth: Relational Theory for Practitioners. O'Reilly Media, 2005.
Date, C.J. and Darwen, H. A Guide to the SQL Standard. Addison-Wesley, 4th Edition, 1997.
ISO/IEC 9075-2:2023. "Information technology — Database languages SQL — Part 2: Foundation (SQL/Foundation)."