Introduction
Relational databases maintain a write-ahead log (WAL) as the backbone of crash recovery and replication.
Physical replication streams these raw WAL bytes to replicas, which replay them block by block.
This approach is efficient but tightly couples the replica to the primary's storage format, version, and even architecture.
Logical decoding takes a fundamentally different approach: it reads the WAL and reconstructs high-level, row-oriented change events (INSERT, UPDATE, DELETE) that are independent of the physical storage representation.
Output plugins sit at the boundary of this pipeline, transforming decoded changes into application-consumable formats such as JSON, Protobuf, or SQL statements.
PostgreSQL formalized this architecture in version 9.4 (2014), and it remains the most mature open-source implementation of pluggable logical decoding.
This article examines the internals of logical decoding, the role and design of output plugins, and the engineering tradeoffs involved.
How Logical Decoding Works
WAL as the Source of Truth
PostgreSQL's WAL contains physical change records: page-level modifications, heap tuple insertions, index updates, and transaction control records.
Logical decoding must reverse-engineer row-level semantics from this physical stream.
The process relies on several components:
- WAL reader: Reads raw WAL records sequentially from disk or streaming replication connections.
- Decoder (ReorderBuffer): Collects changes per transaction, reorders them by commit LSN, and assembles complete, committed transactions before emitting them.
- Catalog snapshot: Maintains a historical view of the system catalog so that tuple data can be decoded using the correct schema, even if the schema has since changed.
- Output plugin: Receives decoded row changes and serializes them into the desired wire format.
The Reorder Buffer
WAL records are written in the order operations execute, not in commit order.
A long-running transaction's changes may be interleaved with records from many other transactions.
The ReorderBuffer collects changes, groups them by transaction ID, and only emits a transaction's changes once a COMMIT record is encountered — at which point all buffered (or spilled) changes for that transaction are emitted in operation order.
Aborted transactions are discarded entirely, meaning logical decoding only produces committed, consistent change streams.
For large transactions, the ReorderBuffer spills to disk to avoid unbounded memory consumption, potentially long before the COMMIT record is seen.
PostgreSQL controls this via the logical_decoding_work_mem parameter.
This spill mechanism is critical for production workloads where bulk data loads or long-running transactions can generate millions of change records before committing.
Catalog Snapshots and Schema Resolution
Decoding a raw heap tuple requires knowledge of the table's schema at the time the tuple was written.
If a column is added or dropped between the time a change was made and the time it is decoded, using the current catalog would produce incorrect results.
PostgreSQL addresses this in two related ways.
First, when a replication slot is created, an exported snapshot is written to pg_logical/snapshots/, establishing a consistent starting point that a new subscriber can use to take an initial table dump before streaming begins.
Second, and more importantly for ongoing decoding, system catalog tables are themselves WAL-logged.
The logical decoding machinery replays catalog WAL records into a private, versioned catalog view, allowing it to reconstruct the schema as it existed at any given LSN.
This means the decoder can always resolve the correct tuple descriptor for any change it processes, even if the table has been altered since.
This design introduces a constraint: tables must have a REPLICA IDENTITY configured (default, full, or using an index) so that the decoder can identify which row was affected by UPDATE and DELETE operations.
Without this, the old tuple key may not be logged in the WAL, making it impossible to reconstruct meaningful change events.
Output Plugins
Architecture
An output plugin is a shared library that implements a well-defined callback interface.
PostgreSQL's logical decoding infrastructure invokes these callbacks at specific points during the decoding lifecycle.
The plugin never reads the WAL directly; it receives pre-decoded, schema-resolved change events.
The core callbacks are:
- startup_cb: Called when the replication slot is created or a decoding session begins. Used to negotiate options with the consumer (e.g., whether to include transaction boundaries, output format preferences).
- begin_cb: Invoked at the start of each decoded transaction.
- change_cb: Invoked for each row-level change (INSERT, UPDATE, DELETE). It receives the relation metadata, the old tuple (for UPDATE/DELETE if available), and the new tuple (for INSERT/UPDATE).
- truncate_cb: Invoked for TRUNCATE operations (added in PostgreSQL 11).
- commit_cb: Invoked when a transaction is fully emitted.
- message_cb: Handles logical decoding messages, which are user-defined WAL records that applications can inject for custom signaling.
Built-in and Community Plugins
PostgreSQL ships with test_decoding, a reference plugin that outputs human-readable text.
The more widely used production plugins include:
- pgoutput: The built-in plugin used by PostgreSQL's native logical replication (publication/subscription). It emits a binary protocol understood by subscriber nodes. Since Debezium 1.x,
pgoutputhas been the preferred plugin for the Debezium PostgreSQL connector and remains the recommended choice for new CDC integrations as of current Debezium releases (2.x). - wal2json: Emits changes as JSON documents, widely used for CDC (Change Data Capture) pipelines feeding Kafka, Elasticsearch, or custom consumers.
- decoderbufs: Emits changes as Protocol Buffers. This plugin was originally designed for Debezium integration but is now largely superseded by
pgoutputfor that use case; it may still be appropriate for custom Protobuf consumers.
Plugin Lifecycle
The following walkthrough illustrates the sequence of calls during a decoding session.
Walkthrough
Consider a replication slot using a hypothetical JSON output plugin.
A client connects and starts streaming changes.
During the session, two transactions commit: one inserts a row, and another updates, then deletes a row.
Step 1: Client connects to replication slot "my_slot"
-> Plugin startup_cb invoked
Plugin negotiates options: { "include-xids": true, "include-timestamp": true }
Step 2: WAL reader encounters COMMIT record for xid=500
ReorderBuffer emits transaction 500 (1 change collected)
-> begin_cb(xid=500, commit_lsn=0/1A000060)
Plugin outputs: { "xid": 500, "event": "BEGIN" }
-> change_cb(INSERT, relation="public.orders",
new_tuple={id=42, amount=99.50, status="pending"})
Plugin resolves column types using historical catalog view
Plugin outputs: { "table": "public.orders", "op": "INSERT",
"data": {"id": 42, "amount": 99.50, "status": "pending"} }
-> commit_cb(xid=500, commit_lsn=0/1A000060)
Plugin outputs: { "xid": 500, "event": "COMMIT",
"timestamp": "2024-01-15 10:30:00Z" }
Step 3: WAL reader encounters COMMIT record for xid=501
ReorderBuffer emits transaction 501 (2 changes collected)
-> begin_cb(xid=501, commit_lsn=0/1A000120)
-> change_cb(UPDATE, relation="public.orders",
old_tuple={id=42}, new_tuple={id=42, amount=99.50, status="shipped"})
Plugin outputs: { "table": "public.orders", "op": "UPDATE",
"old_keys": {"id": 42},
"data": {"id": 42, "amount": 99.50, "status": "shipped"} }
-> change_cb(DELETE, relation="public.orders",
old_tuple={id=42})
Plugin outputs: { "table": "public.orders", "op": "DELETE",
"old_keys": {"id": 42} }
-> commit_cb(xid=501, commit_lsn=0/1A000120)
Step 4: Client acknowledges LSN 0/1A000120
Replication slot advances; WAL before this LSN becomes reclaimable.
Note that the old tuple content for UPDATE and DELETE depends on the table's REPLICA IDENTITY setting.
With the default setting, only the primary key columns appear in the old tuple.
With REPLICA IDENTITY FULL, all column values are included, which is essential for consumers that need to apply changes idempotently or resolve conflicts.
Engineering Tradeoffs and Considerations
WAL Retention Pressure
A replication slot pins WAL segments: PostgreSQL cannot recycle WAL files that the slot's consumer has not yet acknowledged.
If a consumer falls behind or disconnects, WAL accumulates on disk without bound.
This is a serious operational risk.
PostgreSQL 13 introduced max_slot_wal_keep_size to cap WAL retention.
When the limit is exceeded, the slot is marked invalid (it remains visible in pg_replication_slots but becomes unusable and must be dropped manually), allowing PostgreSQL to reclaim the pinned WAL.
Monitoring slot lag and setting this parameter are both essential in production.
Decoding Overhead
Logical decoding is CPU-intensive.
Each tuple must be deserialized, schema-resolved, and re-serialized by the output plugin.
For high-throughput OLTP systems generating millions of changes per second, this overhead can be substantial.
PostgreSQL 16 introduced support for performing logical decoding on a physical standby server, offloading this work from the primary.
The standby must be running in hot standby mode and must be able to apply WAL at sufficient speed to avoid falling behind the primary, which introduces additional operational considerations.
Consistency and Ordering Guarantees
Because the ReorderBuffer emits only committed transactions in the order their COMMIT records appear in the WAL (commit-LSN order), the output stream provides a linearizable, commit-order-consistent view of all committed changes.
Consumers see a consistent, gap-free history with no partial transactions.
This property is what makes logical decoding suitable for building materialized views, audit logs, and cross-system replication.
Plugin Filtering
Output plugins can implement filtering logic, emitting changes only for specific tables, schemas, or operation types.
The pgoutput plugin, for instance, uses publication definitions to filter at the plugin level: change_cb is not invoked for tables that are not part of any publication, avoiding the serialization cost for irrelevant changes.
However, the ReorderBuffer must still process and reassemble all WAL records regardless of filtering.
The transaction assembly cost cannot be eliminated.
Filtering reduces serialization and network overhead, not WAL-reading overhead.
Transactional Streaming
For very large transactions, waiting until COMMIT to start emitting changes introduces latency and memory pressure.
PostgreSQL 14 added support for streaming in-progress transactions to the output plugin via stream_start_cb, stream_change_cb, and stream_stop_cb callbacks.
Output plugins must explicitly opt in to this capability; it is not enabled by default.
The consumer receives changes before commit and must handle potential rollback, which significantly increases consumer-side protocol complexity.
This tradeoff is nevertheless essential for systems that cannot tolerate buffering multi-gigabyte transactions.
Key Points
- Logical decoding reconstructs row-level change events from the physical WAL, decoupling consumers from storage format and database version.
- The ReorderBuffer ensures that only committed transactions are emitted, in the order their COMMIT records appear in the WAL (commit-LSN order), providing a linearizable, commit-order-consistent change stream.
- Historical catalog state is recovered by replaying catalog WAL records into a private versioned view, allowing correct schema resolution even when tables are altered after the original change was written.
- Output plugins implement a callback interface (begin, change, commit, and others) and are responsible solely for serialization, not for WAL parsing or transaction assembly.
- REPLICA IDENTITY configuration directly controls which column values are available in old tuples for UPDATE and DELETE events.
- Replication slots pin WAL on disk, creating operational risk if consumers fall behind; bounded retention via
max_slot_wal_keep_sizeand active slot monitoring are essential in production. - Streaming large in-progress transactions (PostgreSQL 14+, requires explicit plugin opt-in) reduces latency and memory pressure but shifts rollback handling to the consumer.
pgoutputis the recommended plugin for both native logical replication and Debezium-based CDC;decoderbufsis largely legacy for new deployments.
References
Kemme, B., Jiménez-Peris, R., and Patiño-Martínez, M. "Database Replication." Synthesis Lectures on Data Management, Morgan & Claypool, 2010.
PostgreSQL Global Development Group. "Chapter 49: Logical Decoding." PostgreSQL 16 Documentation, 2024.
Kleppmann, M. "Designing Data-Intensive Applications." O'Reilly Media, 2017. Chapter 11: Stream Processing.
Mohan, C., Haderle, D., Lindsay, B., Pirahesh, H., and Schwarz, P. "ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging." ACM Transactions on Database Systems, 17(1), 1992.