In any partitioned database, the primary key determines how data is distributed across partitions.
But most real-world queries don't filter on the primary key alone.
Users search by color, location, status, tag, or any number of non-primary attributes.
Secondary indexes support these queries, and the way you partition those indexes has deep consequences for write amplification, query latency, and system complexity.
There are two fundamental strategies for partitioning secondary indexes: partition-local indexing (often called "document-partitioned" or "local" indexes) and term-partitioned indexing (often called "global" indexes).
Each strategy represents a different point in the trade-off space between write cost, read cost, and consistency.
The Problem
Consider a marketplace application where items are partitioned by item_id.
A user wants to find all items where color = red.
The color field is not the partition key, so no single partition can answer this query from its own primary data alone without scanning.
A secondary index on color solves this, but the index itself must live somewhere in the cluster.
Where it lives, and how it's organized, defines the partitioning strategy.
Document-Partitioned (Local) Indexes
In a local index strategy, each partition maintains its own secondary index covering only the documents stored on that partition.
When a write arrives at partition 3 with {item_id: 7291, color: "red"}, partition 3 updates its local secondary index entry for color:red to include item_id: 7291.
The write is entirely local to the partition that owns the document.
Write Path
Writes are simple.
The partition that owns the document updates its local index as part of the same operation, often in the same transaction or write batch.
There is no cross-partition coordination.
Write latency is bounded by the performance of a single partition.
Read Path
Reads are expensive.
A query for color = red cannot know which partitions contain red items.
The query must be sent to every partition in the cluster.
Each partition searches its local index and returns matching results, which are then merged by a coordinator.
This pattern is called scatter-gather.
Scatter-gather has several problems:
- Tail latency amplification. The overall query latency is determined by the slowest partition. With N partitions, the probability that at least one is slow grows with N.
- Fan-out load. Every secondary index query touches every data partition, even if most partitions return zero results.
- Merge overhead. Results from all partitions must be collected, merged, and possibly re-sorted at the coordinator.
Where It's Used
Elasticsearch, MongoDB, and Cassandra (via its secondary index implementation) use local indexes as their default or primary strategy.
The appeal is operational simplicity on the write path and the ability to maintain index consistency within a single partition without distributed transactions.
Note on Cassandra's
ALLOW FILTERING:ALLOW FILTERINGis not a secondary index strategy. It instructs Cassandra to perform a full scan across partitions without an index, which is generally discouraged for production use. Cassandra's actual secondary indexes are local indexes. Cassandra also supports materialized views, which are a form of global indexing discussed below.
Term-Partitioned (Global) Indexes
In a global index strategy, the secondary index is itself partitioned by the indexed term rather than by the document's primary key.
All entries for color:red live in a single index partition (or a deterministic subset of partitions), regardless of which data partition holds the underlying document.
For example, the index might be range-partitioned so that colors a-m go to index partition 0, and colors n-z go to index partition 1.
Alternatively, the index term can be hash-partitioned for uniform distribution.
Write Path
Writes become more complex.
When a document is written to data partition 3, the system must also update one or more index partitions, which may reside on different nodes.
If the document has multiple indexed fields (color, category, location), each field's index partition may be on a different node.
This means a single document write can require updates to multiple partitions.
This creates a fundamental consistency challenge.
Either you:
- Use a distributed transaction (2PC or similar) to atomically update the data partition and all relevant index partitions. This adds latency and reduces availability.
- Update the index asynchronously, accepting a window of inconsistency where the index doesn't reflect the latest writes.
Most systems choose option 2, Amazon DynamoDB's global secondary indexes, for example, are updated asynchronously and are documented as eventually consistent.
Read Path
Reads are efficient.
A query for color = red can be routed to the single index partition responsible for that term.
The index partition returns a list of document IDs (and the data partitions they belong to), while the coordinator fetches only the relevant documents.
There is no scatter-gather across all partitions.
Where It's Used
DynamoDB global secondary indexes, Google Cloud Spanner's secondary indexes (non-interleaved), and CockroachDB's secondary indexes all implement variants of global indexing.
Spanner also supports interleaved indexes, which co-locate index entries with their parent table rows as a storage optimization, but this is distinct from the indexing partitioning strategy — non-interleaved indexes in Spanner are the true globally-partitioned variant.
The complexity of global index maintenance is managed at the system level to give users efficient read access patterns.
Walkthrough
The following walkthrough illustrates both strategies handling the same write and read operations on a system with 4 data partitions.
Setup
Data partitioned by hash(item_id) into partitions P0, P1, P2, P3.
Secondary index on "color".
Write: Insert {item_id: 42, color: "red"} where hash(42) -> P1
Local index approach:
1. Route write to P1 (owner of item_id 42).
2. P1 stores the document.
3. P1 updates its local index: local_index["red"].add(42).
4. Write complete. No other partitions involved.
Global index approach (hash-partitioned index, hash("red") -> P2):
1. Route write to P1 (owner of item_id 42).
2. P1 stores the document.
3. Determine index partition for "red": hash("red") -> P2.
4. Send index update to P2: global_index["red"].add(item_id=42, data_partition=P1).
5. P2 applies the index update.
6. Write complete (or acknowledged before step 5 if async).
Read: Find all items where color = "red"
Local index approach (scatter-gather):
1. Coordinator sends query {color: "red"} to P0, P1, P2, P3.
2. Each partition searches its local index for "red".
- P0 returns: {item_id: 11}
- P1 returns: {item_id: 42, item_id: 88}
- P2 returns: {}
- P3 returns: {item_id: 305}
3. Coordinator merges results: {11, 42, 88, 305}.
4. Latency = max(latency_P0, latency_P1, latency_P2, latency_P3).
Global index approach:
1. Coordinator computes hash("red") -> P2.
2. Coordinator sends query to P2 only.
3. P2 looks up global_index["red"]:
Returns: {(42, P1), (11, P0), (88, P1), (305, P3)}
4. Coordinator fetches documents from P0, P1, P3 as needed.
5. Latency = latency_index_lookup + max(latency of document fetches).
Note that the global approach still requires fetching actual documents from their data partitions (step 4), but this is a targeted fetch of known documents, not a broadcast.
Trade-off Analysis
| Dimension | Local Index | Global Index |
|---|---|---|
| Write latency | Low (single partition) | Higher (cross-partition coordination) |
| Write complexity | Simple | Requires distributed update or async propagation |
| Read latency | High (scatter-gather across all partitions) | Low (single index partition lookup + targeted fetches) |
| Read amplification | O(N) data partitions queried per read | O(1) index partitions queried per read |
| Consistency | Strong (index co-located with data) | Eventual (if async) or expensive (if synchronous) |
| Rebalancing complexity | Index moves with data partition | Index must be independently rebalanced |
| Hot spot risk on writes | Low (writes distribute with data) | Higher (popular index terms concentrate updates) |
Hybrid and Practical Considerations
Real systems often blur the boundary between these two strategies or layer additional mechanisms on top.
Covering indexes store copies of frequently accessed columns in the index itself, eliminating the document fetch step in global indexes.
Spanner and CockroachDB support this.
Partitioned global indexes split the global index into ranges or hash buckets, so no single node must hold the entire index for a given term.
This mitigates hot spots but introduces the possibility of needing to consult multiple index partitions for range queries.
Materialized views in systems like Cassandra create what is effectively a new table partitioned by the indexed attribute.
This is a form of global index where the "index" is a full copy of the data, denormalized into a different partition scheme.
Index maintenance during rebalancing is significantly harder for global indexes.
When a data partition splits or moves, all global index partitions that reference documents in that partition must be updated.
Local indexes, by contrast, simply move with their data partition.
For read-heavy workloads with selective queries on non-primary attributes, global indexes generally perform better.
For write-heavy workloads or workloads where consistency is critical and distributed transactions are unacceptable, local indexes are the simpler and safer choice.
Key Points
- Local (document-partitioned) secondary indexes store index entries on the same partition as the document, making writes simple but requiring scatter-gather for reads.
- Global (term-partitioned) secondary indexes partition the index by the indexed value, enabling targeted reads but requiring cross-partition updates on writes.
- Scatter-gather reads suffer from tail latency amplification that worsens as the number of data partitions grows.
- Global indexes are typically maintained asynchronously to avoid distributed transactions, introducing a window of index staleness.
- The choice between local and global indexing is fundamentally a trade-off between write cost, read cost, and consistency guarantees.
- Hot spots can arise in global indexes when a small number of index terms receive a disproportionate share of writes.
- Real systems often combine both strategies and use techniques like covering indexes and materialized views to optimize specific access patterns.
ALLOW FILTERINGin Cassandra is not a secondary index mechanism — it is a full partition scan directive and should not be confused with local index behavior.
References
Martin Kleppmann. Designing Data-Intensive Applications. O'Reilly Media, 2017. Chapter 6: Partitioning.
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, et al. "Dynamo: Amazon's Highly Available Key-Value Store." SOSP, 2007. (Note: Describes the original Dynamo system; Amazon DynamoDB is a separately evolved managed service that shares design lineage.)
James C. Corbett, Jeffrey Dean, Michael Epstein, et al. "Spanner: Google's Globally-Distributed Database." OSDI, 2012.
Avinash Lakshman and Prashant Malik. "Cassandra: A Decentralized Structured Storage System." LADIS, 2009.
Amazon Web Services. "Working with Global Secondary Indexes in DynamoDB." AWS Documentation. https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html