NULL Semantics and Three-Valued Logic

Introduction

SQL's NULL is one of the most misunderstood constructs in relational databases.
Unlike every other value in a database, NULL does not represent a known datum.
It represents the absence of a value, and this single design decision forces the entire logical foundation of SQL away from classical Boolean logic and into three-valued logic (3VL).
The consequences are pervasive: they affect query evaluation, indexing, aggregation, constraint checking, and join behavior.
Engineers who treat NULL as "just another value" routinely write queries that silently return incorrect results.

This article covers the formal semantics of NULL in SQL, the mechanics of three-valued logic, and the practical implications for query evaluation in real database systems.

The Origins of NULL

E.F.
Codd introduced the concept of missing information in his relational model.
In his later work, he actually proposed two kinds of nulls: one for "applicable but missing" (the value exists but is unknown) and one for "inapplicable" (the value does not apply to this entity).
SQL adopted a simpler design with a single NULL marker that conflates both meanings.

The SQL standard (ISO/IEC 9075) defines NULL not as a value but as a marker indicating the absence of a value.
This distinction matters. NULL is not zero, not an empty string, and not a special sentinel.
It belongs to no data type.
Every data type in SQL is extended with the possibility of being NULL, unless a NOT NULL constraint prohibits it.

Three-Valued Logic

Classical Boolean logic has two truth values: TRUE and FALSE.
When NULL enters a comparison, the result is neither TRUE nor FALSE but UNKNOWN.
This third truth value gives rise to three-valued logic.
SQL's WHERE clause filters rows for which the predicate evaluates to TRUE.
Rows producing UNKNOWN are excluded, just like rows producing FALSE.

Truth Tables

The standard logical connectives are extended as follows.

AND:

A	B	A AND B
TRUE	TRUE	TRUE
TRUE	FALSE	FALSE
TRUE	UNKNOWN	UNKNOWN
FALSE	TRUE	FALSE
FALSE	FALSE	FALSE
FALSE	UNKNOWN	FALSE
UNKNOWN	TRUE	UNKNOWN
UNKNOWN	FALSE	FALSE
UNKNOWN	UNKNOWN	UNKNOWN

OR:

A	B	A OR B
TRUE	TRUE	TRUE
TRUE	FALSE	TRUE
TRUE	UNKNOWN	TRUE
FALSE	TRUE	TRUE
FALSE	FALSE	FALSE
FALSE	UNKNOWN	UNKNOWN
UNKNOWN	TRUE	TRUE
UNKNOWN	FALSE	UNKNOWN
UNKNOWN	UNKNOWN	UNKNOWN

NOT:

A	NOT A
TRUE	FALSE
FALSE	TRUE
UNKNOWN	UNKNOWN

The key intuition: FALSE AND UNKNOWN is FALSE (because regardless of the unknown value, the conjunction cannot be true). TRUE OR UNKNOWN is TRUE (because regardless of the unknown value, the disjunction is already satisfied).
In all other cases involving UNKNOWN, the result is UNKNOWN.

Walkthrough

diagram-1 — Row-level evaluation flow for NULLs with OR and WHERE

Consider a table employees with columns name, department, and salary, where some rows have NULL in the salary column.

Query:

SELECT name FROM employees WHERE salary > 50000;

Step-by-step evaluation for a row where salary is NULL:

Evaluate the expression salary > 50000.
Substitute: NULL > 50000.
Any comparison with NULL yields UNKNOWN.
The WHERE clause requires TRUE to include the row.
UNKNOWN is not TRUE, so the row is excluded.

Query:

SELECT name FROM employees WHERE salary > 50000 OR department = 'Engineering';

Evaluation for a row where salary is NULL and department is 'Engineering':

NULL > 50000 evaluates to UNKNOWN.
'Engineering' = 'Engineering' evaluates to TRUE.
UNKNOWN OR TRUE evaluates to TRUE (see OR truth table).
The row is included.

Evaluation for a row where salary is NULL and department is NULL:

NULL > 50000 evaluates to UNKNOWN.
NULL = 'Engineering' evaluates to UNKNOWN.
UNKNOWN OR UNKNOWN evaluates to UNKNOWN.
The row is excluded.

The NOT IN trap:

SELECT name FROM employees WHERE department NOT IN ('Sales', NULL);

This query returns zero rows, regardless of data.
Here is why:

NOT IN (...) expands to department <> 'Sales' AND department <> NULL.
department <> NULL evaluates to UNKNOWN for every row, because any comparison against NULL yields UNKNOWN.
anything AND UNKNOWN is either FALSE or UNKNOWN, never TRUE.
No row passes the filter.

This trap applies equally when the list comes from a subquery that returns any NULL value — not just when a literal NULL appears in a static list.
It is one of the most common NULL-related bugs in production SQL.

Practical Consequences

Equality and the IS Operator

NULL = NULL evaluates to UNKNOWN, not TRUE.
Two nulls are not considered equal.
This is logically consistent (two unknown values have no reason to be the same), but it surprises engineers accustomed to programming languages where null == null is true.

To test for NULL, SQL provides the IS NULL and IS NOT NULL predicates, which always return TRUE or FALSE, never UNKNOWN.

Handling NULLs Explicitly

SQL provides several functions for working with nulls explicitly. COALESCE(expr1, expr2, ...) returns the first non-null argument, making it useful for substituting default values. NULLIF(expr1, expr2) returns NULL if the two arguments are equal, and expr1 otherwise — the inverse of COALESCE.
These constructs let you convert between null and non-null representations at query time without changing stored data.

Aggregation

Aggregate functions in SQL skip NULL values. SUM, AVG, MIN, MAX, and COUNT(column) all ignore nulls. COUNT(*) counts rows regardless of column values.
This means AVG(salary) computes the average of non-null salaries only, which may or may not match the analyst's intent.
If all values in a group are NULL, aggregate functions such as SUM, AVG, MIN, and MAX return NULL; COUNT(column) returns 0 because it counts non-null occurrences and there are none.

DISTINCT and GROUP BY

Despite NULL <> NULL yielding UNKNOWN in comparisons, DISTINCT and GROUP BY treat all nulls as belonging to the same group.
This is a deliberate departure from strict 3VL semantics for pragmatic reasons: without it, every NULL would form its own group, making grouping nearly useless for nullable columns.

Joins

diagram-2 — Equi-join behavior showing NULLs do not match each other

In equi-joins, NULL values never match each other because the equality comparison yields UNKNOWN.
If you need to match nulls, you must use IS NOT DISTINCT FROM (SQL:2003) or database-specific equivalents such as <=> in MySQL or IS NOT DISTINCT FROM in PostgreSQL.

Indexes and Uniqueness

A UNIQUE constraint in SQL permits multiple NULL values in most implementations (PostgreSQL, SQLite, MySQL with InnoDB).
The rationale follows from 3VL: since NULL = NULL is UNKNOWN, two nulls are not provably equal, so no uniqueness violation can be established.

Oracle's behavior is nuanced: its standard B-tree indexes do not store entries where all indexed columns are NULL, so a single-column B-tree index effectively cannot enforce uniqueness across multiple NULLs.
In practice, Oracle's UNIQUE constraint therefore permits multiple NULL values — but through an indexing omission rather than an explicit semantic choice.
This means IS NULL predicates on a single-column B-tree index in Oracle may not use the index at all.
The standard workaround is a function-based index (e.g., indexing a CASE expression that maps NULL to a sentinel) or using a partial index in systems that support them.

SQL Server deviates from the SQL standard in the opposite direction: a unique index on a nullable column permits only a single NULL value.
SQL Server treats NULL as a distinct, known value for uniqueness purposes, meaning a second NULL is considered a duplicate of the first.
This behavior is a common source of portability bugs when migrating between database systems.

CHECK Constraints

A CHECK constraint is satisfied if the predicate evaluates to TRUE or UNKNOWN.
This is the opposite of WHERE clause semantics.
The reasoning: rejecting a row on the basis of incomplete information would be overly strict.
This means CHECK (salary > 0) permits NULL salaries.
If you want to reject nulls, you need an explicit NOT NULL constraint alongside the CHECK.

Implications for Query Optimizers

Three-valued logic complicates query optimization.
Many algebraic rewrites that are valid in two-valued logic become invalid in 3VL.

For example, the law of excluded middle (P OR NOT P = TRUE) does not hold in 3VL.
If salary is NULL, then salary > 50000 OR salary <= 50000 evaluates to UNKNOWN OR UNKNOWN, which is UNKNOWN, not TRUE.
Optimizers must account for this and cannot freely apply such simplifications to nullable expressions.

Double negation (NOT (NOT P) = P) does hold in 3VL, because NOT maps each of the three truth values consistently: NOT TRUE = FALSE, NOT FALSE = TRUE, and NOT UNKNOWN = UNKNOWN.
Applying NOT twice returns any truth value to itself, so this rewrite remains safe for optimizers.

Predicate pushdown through outer joins is another area where NULL semantics constrain optimization.
An outer join may introduce nulls for non-matching rows, and a predicate applied after the join may behave differently than the same predicate applied before the join, precisely because of the UNKNOWN truth value.

Alternatives and Critiques

C.J.
Date has argued extensively against nulls, proposing that missing information should be handled through explicit default values or through decomposition (separating optional attributes into their own relations).
His position is that three-valued logic introduces too much complexity and too many traps.

Others have proposed four-valued or even six-valued logics to distinguish different kinds of missing information.
These proposals have not gained traction in practical systems due to their complexity.

Some modern systems take a pragmatic middle ground.
Apache Spark SQL follows SQL semantics for nulls but provides functions like nullif, coalesce, and isnan to handle edge cases explicitly.
DuckDB and other analytical engines adhere to standard 3VL but invest in clear documentation to reduce confusion.

Key Points

NULL is not a value but a marker for missing information, and any comparison involving NULL yields UNKNOWN, not TRUE or FALSE.
SQL operates under three-valued logic, where WHERE clauses include only rows evaluating to TRUE, silently excluding UNKNOWN results.
NULL = NULL is UNKNOWN, which is why SQL provides IS NULL, IS NOT NULL, and IS NOT DISTINCT FROM for null-safe comparisons.
NOT IN with a NULL element in the list — whether a literal or a value returned by a subquery — is a notorious bug source, as the entire predicate can never evaluate to TRUE.
DISTINCT and GROUP BY pragmatically group all nulls together, deviating from strict 3VL semantics.
CHECK constraints accept UNKNOWN (unlike WHERE), meaning they do not reject null values without an explicit NOT NULL constraint.
Algebraic identities valid in two-valued logic (such as the law of excluded middle) may not hold in 3VL, constraining query optimizer rewrites; double negation is a safe rewrite but excluded-middle elimination is not.
SQL Server permits only one NULL in a unique index on a nullable column, treating NULL as a known distinct value for uniqueness purposes and diverging from the SQL standard and from the behavior of PostgreSQL, MySQL, and SQLite.
Oracle permits multiple NULLs under a UNIQUE constraint because its B-tree indexes do not store all-NULL entries, an indexing artifact rather than an explicit semantic policy.

References

Codd, E.F. "A Relational Model of Data for Large Shared Data Banks." Communications of the ACM, Vol. 13, No. 6, June 1970, pp. 377–387.

Codd, E.F. "Missing Information (Applicable and Inapplicable) in Relational Databases." ACM SIGMOD Record, Vol. 15, No. 4, December 1986, pp. 53–78.

Date, C.J. Database in Depth: Relational Theory for Practitioners. O'Reilly Media, 2005.

Date, C.J. and Darwen, H. A Guide to the SQL Standard. Addison-Wesley, 4th Edition, 1997.

ISO/IEC 9075-2:2023. "Information technology — Database languages SQL — Part 2: Foundation (SQL/Foundation)."