Query optimization is one of the most consequential subsystems in a relational database engine.
A declarative SQL statement admits many physically distinct execution plans, and the performance difference between the best and worst plan can span several orders of magnitude.
Cost-based query optimization (CBO) addresses this by modeling the resource consumption of candidate plans and selecting the one with the lowest estimated cost.
This article covers the core machinery: plan enumeration, cost modeling, cardinality estimation, and the search strategies that tie them together.
The Optimization Problem
A SQL query specifies what data to retrieve, not how to retrieve it.
The optimizer's job is to choose the "how." For a query joining N tables, the number of distinct join orderings for left-deep trees is N! (or N!/2 if join commutativity is exploited).
Bushy trees admit even more shapes: the total number of distinct binary tree topologies with N labeled leaves (counting both shape and ordering) grows super-exponentially, reaching on the order of tens of billions for a 10-table join.
Each join can use different physical operators (nested loop, hash join, sort-merge join), each table can be accessed via different paths (sequential scan, index scan, index-only scan), and predicates can be pushed down or deferred.
The combinatorial explosion is real and motivates the use of systematic search algorithms.
A cost-based optimizer navigates this space by assigning a scalar cost estimate to each candidate plan and searching for the plan with the minimum cost.
The cost estimate is a function of estimated cardinalities, available access paths, and a model of the system's I/O and CPU characteristics.
Plan Representation
Optimizers reason over trees of relational algebra operators.
A plan tree's leaves are base table access operations, and internal nodes are joins, aggregations, sorts, and other operators.
Two important structural distinctions:
- Left-deep trees: every right child of a join is a base table. These are pipeline-friendly and the traditional focus of System R-style optimizers.
- Bushy trees: joins can appear on both sides of a join node. These expose more parallelism and can be necessary for optimal plans, but they vastly expand the search space.
Each logical operator (e.g., "join R and S on R.a = S.b") maps to one or more physical operators (hash join, merge join, nested loop join), each with different cost characteristics.
The optimizer must choose both the logical structure (join order, predicate placement) and the physical implementation.
Cardinality Estimation
Cardinality estimation is the foundation on which cost models rest.
An error in cardinality propagates and compounds through every upstream operator in the plan tree.
The optimizer needs to estimate the number of rows produced by each operator node.
Selectivity of Predicates
For a filter predicate col = value on a table with N rows, the selectivity under a uniformity assumption is 1 / NDV(col), where NDV is the number of distinct values.
The estimated output cardinality is N / NDV(col).
For range predicates like col > value, the optimizer typically assumes uniform distribution within the column's min/max range and computes the fraction of the range that satisfies the predicate.
Histograms
Real data is rarely uniform.
Most production systems maintain histograms (equi-width or equi-depth) on column value distributions.
Equi-depth (equi-height) histograms partition the value domain into buckets containing roughly equal numbers of rows, giving better resolution in dense regions of the distribution.
PostgreSQL, for example, builds most-common-values lists alongside equi-depth histograms.
Join Cardinality
For an equi-join R.a = S.b, a standard estimate (assuming value containment and uniformity) is:
|R ⋈ S| = |R| * |S| / max(NDV(R.a), NDV(S.b))
This formula assumes that the distinct values of the column with the smaller NDV are fully contained in the domain of the column with the larger NDV — that is, every key in the smaller domain finds a match in the larger domain.
Violations of this assumption (e.g., many non-matching keys, or neither column's domain being a subset of the other) are a major source of estimation error in practice.
The Independence Assumption
When multiple predicates apply, most optimizers assume statistical independence, and multiply selectivities.
If predicate P1 has selectivity 0.1 and P2 has selectivity 0.05, the combined selectivity is estimated as 0.005.
Correlated columns (e.g., city and zip code) violate this assumption badly, sometimes by orders of magnitude.
Multi-column statistics and more recently learned cardinality estimators attempt to address this, but the independence assumption remains the default in most systems.
Cost Model
The cost model translates cardinality estimates and operator choices into a scalar cost value.
A typical model accounts for:
- I/O cost: pages read from disk (sequential vs. random). Sequential I/O is cheaper per page than random I/O, typically by a factor of 4x or more in cost model parameters.
- CPU cost: per-tuple processing cost for evaluating predicates, hashing, and comparing keys.
- Memory cost: whether an operator (e.g., hash join build side) fits in the allocated memory budget, or must spill to disk.
For example, a simplified cost for a sequential scan of a table with P pages and T tuples might be:
cost_seq_scan = P * seq_page_cost + T * cpu_tuple_cost
An index scan on a selective predicate might estimate:
cost_index_scan = T_qual * random_page_cost + T_qual * cpu_tuple_cost + index_traversal_cost
where T_qual = selectivity * T is the estimated number of qualifying tuples.
Each qualifying tuple may require a separate random page fetch in the worst case (no clustering), so cost grows with the number of qualifying tuples rather than as a simple fraction of total pages.
In practice, PostgreSQL's formula also accounts for index correlation (physical ordering of tuples relative to the index order), which can reduce random I/O when the table is well-clustered on the indexed column.
The crossover point where an index scan becomes cheaper than a sequential scan depends on selectivity.
For low-selectivity queries (returning a large fraction of the table), sequential scan wins because random I/O is expensive.
PostgreSQL exposes these cost model parameters (seq_page_cost, random_page_cost, cpu_tuple_cost, etc.) as tunable configuration values, which allows DBAs to calibrate the model to their hardware.
Walkthrough
The following walkthrough illustrates the System R dynamic programming approach for join order enumeration, which remains the backbone of most commercial optimizers.
System R Dynamic Programming Algorithm
Input: A set of N base relations {R1, R2, ..., RN} and a set of join predicates.
Output: The optimal join order and operator choices for the query.
// Phase 1: Single-relation plans
For each relation Ri:
Enumerate all access paths (seq scan, each available index scan)
Retain the cheapest plan for each "interesting order"
(An interesting order is a sort order useful for later ORDER BY,
GROUP BY, or merge join operators)
// Phase 2: Build optimal plans bottom-up by subset size
For k = 2 to N:
For each subset S of size k:
For each way to split S into (S1, S2) where:
- S1 and S2 are non-empty and disjoint
- S1 ∪ S2 = S
- There exists a join predicate connecting S1 and S2
- (System R restricts S2 to be a single relation for left-deep trees)
For each physical join operator (hash, merge, nested loop):
estimated_cost = cost(best_plan(S1)) + cost(join S1 and S2)
If estimated_cost < best_plan(S).cost:
best_plan(S) = this plan
// Phase 3: Select final plan
Return best_plan({R1, R2, ..., RN})
Interesting orders are a key insight from the original System R paper.
A plan that produces output sorted on a join column might be slightly more expensive than the cheapest plan for a subexpression, but it avoids a sort later.
The algorithm retains the cheapest plan per interesting order for each subset, not just the single cheapest plan.
This is what makes the algorithm correct despite pruning.
The time complexity is O(3^N) for subset enumeration.
This bound arises from counting, over all subsets S of the N relations, the number of ways to partition S into two disjoint non-empty parts (S1, S2).
Equivalently, for each pair (S1, S2), each relation independently falls into one of three categories: it belongs to S1, it belongs to S2, or it belongs to neither — giving at most 3^N combinations in total.
This is tractable for queries up to roughly 15–20 tables.
Beyond that, heuristic or randomized approaches become necessary.
Handling Large Search Spaces
For queries exceeding the practical limits of dynamic programming, optimizers employ alternative strategies:
- Greedy heuristics: build the join order incrementally, always adding the next table that produces the cheapest two-way join. Fast (O(N^2) per step) but can miss globally optimal plans.
- Randomized search: algorithms like iterative improvement or simulated annealing explore the plan space by making random perturbations (swapping join order, changing operator) and accepting improvements. These work well for very large queries (50+ tables) common in decision support workloads.
- The Cascades framework: used by SQL Server and other systems, Cascades uses a top-down, memoization-based approach with transformation rules. It explores the search space lazily, applying pruning via branch-and-bound. This is more flexible than System R-style bottom-up enumeration and handles non-join operators (aggregation pushdown, subquery decorrelation) more naturally.
Limitations and Practical Considerations
Cost-based optimization is only as good as its estimates.
In practice, cardinality estimation errors are the dominant source of plan quality problems.
Errors compound multiplicatively through a plan tree: if each join's cardinality is off by 3x, a 5-join query can see a compounded error of 3^5 = 243x at the top of the tree.
Adaptive query execution addresses this by monitoring actual cardinalities at runtime and re-optimizing or adjusting plans when estimates are found to be significantly wrong.
Oracle's adaptive plans and Apache Spark's adaptive query execution are mature examples of this approach.
PostgreSQL has made incremental progress toward runtime feedback (e.g., parallel query adjustments and partition pruning improvements), but full mid-execution re-optimization comparable to Oracle's implementation is not a standard PostgreSQL feature as of recent releases.
Another practical consideration is plan stability.
Small changes in statistics can cause the optimizer to choose a radically different plan, leading to performance regressions.
Some systems offer plan baselines or hints to mitigate this.
Key Points
- Cost-based optimization selects execution plans by estimating and minimizing a scalar cost function over the space of valid plans.
- Cardinality estimation, built on column statistics and histograms, is the single most critical input to the cost model, and the primary source of plan quality failures.
- The System R dynamic programming algorithm enumerates join orders in O(3^N) time using the principle of optimality, retaining the best plan per interesting order for each subset of relations.
- Cost models combine I/O estimates (distinguishing sequential from random access), CPU costs, and memory constraints into a single comparable metric.
- The independence assumption for multi-predicate selectivity is a known and persistent source of large estimation errors, especially on correlated columns.
- For queries joining many tables, randomized search or top-down frameworks like Cascades replace exhaustive enumeration.
- Adaptive query execution partially compensates for estimation errors by adjusting or re-optimizing plans based on observed runtime cardinalities.
References
P. Selinger, M. Astrahan, D. Chamberlin, R. Lorie, T. Price. Access Path Selection in a Relational Database Management System. Proceedings of ACM SIGMOD, 1979.
G. Graefe. The Cascades Framework for Query Optimization. IEEE Data Engineering Bulletin, 18(3), 1995.
V. Leis, A. Gubichev, A. Mirchev, P. Boncz, A. Kemper, T. Neumann. How Good Are Query Optimizers, Really? Proceedings of the VLDB Endowment, 9(3), 2015.
S. Chaudhuri. An Overview of Query Optimization in Relational Systems. Proceedings of the ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), 1998.
Y. Ioannidis. The History of Histograms (abridged). Proceedings of the VLDB Endowment, 2003.