Introduction
Event-driven architectures depend on a simple contract: producers write events, and consumers read them. The schema of those events defines the contract. In any long-lived system, that schema will change. Fields get added, removed, or renamed. Types get widened. Entire event structures get reorganized. The challenge is making these changes without breaking the producers and consumers that are already running, potentially numbering in the dozens or hundreds, deployed on independent release cycles.
Schema evolution is the set of techniques and constraints that allow event schemas to change over time while maintaining compatibility between independently deployed components. Getting it wrong causes production outages. Getting it right requires understanding compatibility modes, serialization format trade-offs, and the role of schema registries.
Compatibility Modes
The most important concept in schema evolution is the compatibility mode, which defines the rules governing how a schema may change relative to its predecessors.
Naming note: The terms "backward" and "forward" compatibility are used inconsistently across the industry. This article follows the convention used by Confluent Schema Registry and most Kafka ecosystem tooling, which is also the most widely adopted framing. Readers consulting other sources (including some readings of the Avro specification) should verify which convention is in use.
Backward Compatibility
A new schema is backward compatible if consumers using the new schema can read data produced with the old schema. This is the most common requirement when consumers are upgraded before producers. For example, adding a new field with a default value is backward compatible: a consumer expecting the field will simply use the default when reading old events that lack it.
Forward Compatibility
A new schema is forward compatible if consumers using the old schema can read data produced with the new schema. This matters when producers are upgraded before consumers. Removing a field is forward compatible only if old consumers treat that field as optional and can tolerate its absence โ removing a field that old consumers depend on is a breaking change regardless of how the field was originally declared.
Full Compatibility
Full compatibility is the intersection of backward and forward compatibility. A schema change is fully compatible if old consumers can read new data and new consumers can read old data. This is the safest mode and the most restrictive. In practice, it limits changes to adding optional fields with defaults and removing optional fields that no existing consumer depends on.
Breaking Changes
Some changes are inherently incompatible. Renaming a field, changing a field's type from int to string, or removing a required field will break consumers. These changes require coordination strategies such as dual-writing to a new topic, consumer group migration, or versioned event types.
Serialization Formats and Their Evolution Properties
Not all serialization formats handle schema evolution equally well.
Apache Avro was designed with schema evolution as a first-class concern. It uses a writer schema (embedded or referenced at write time) and a reader schema (the consumer's expected schema) with well-defined resolution rules. Avro's schema resolution algorithm handles field additions, removals, and type promotions (e.g., int to long) automatically. Fields are identified by name, and missing fields are filled with defaults.
Protocol Buffers (protobuf) uses numeric field tags rather than names. This makes field renaming safe (only the tag matters) and makes adding new optional fields straightforward. In proto3, unknown fields are preserved by default โ note that early proto3 releases (before protoc 3.5 and the 2018 language specification update) dropped unknown fields, so systems running older toolchains should verify this behavior. Changing a field's type or reusing a deleted field tag breaks compatibility regardless of version.
JSON Schema is flexible but lacks built-in resolution semantics. Evolution is possible but relies on application-level conventions. Without a schema registry enforcing rules, JSON-based event streams tend to accumulate ad-hoc compatibility issues over time.
The Role of Schema Registries
A schema registry is a centralized service that stores versioned schemas and enforces compatibility checks at write time. Confluent Schema Registry, Apicurio Registry, and AWS Glue Schema Registry are common implementations.
The registry sits in the critical path of schema changes, not in the data path. When a producer attempts to register a new schema version, the registry evaluates it against the configured compatibility mode for that subject (typically mapped to a topic or event type). If the change violates the compatibility contract, registration is rejected before any incompatible events reach the stream.
This is a build-time or deploy-time gate, not a runtime overhead per message. Producers typically resolve schemas to numeric IDs, and those IDs are embedded in the event payload. In the Confluent Schema Registry wire format, the payload begins with a 1-byte magic byte (0x00) followed by a 4-byte schema ID, totaling a 5-byte prefix before the serialized data. Consumers use the ID to fetch the writer schema from the registry and perform resolution against their reader schema.
Practical Evolution Strategies
When a breaking change is truly necessary, several strategies can mitigate the impact:
-
Topic versioning. Create a new topic (e.g.,
orders-v2) with the new schema. Run a bridge process that reads from the old topic, transforms events, and writes to the new topic. Migrate consumers gradually. -
Event type envelopes. Wrap events in a generic envelope with a type identifier and version number. Consumers dispatch based on type and version, maintaining handlers for multiple versions simultaneously.
-
Lazy migration. Write all new events in the new format. When consumers encounter old events (e.g., during replay), they apply a migration function. This pushes complexity to the consumer side but avoids rewriting history.
-
Schema-on-read with a data catalog. Store events in a raw format and apply schema interpretation at read time, similar to how data lakes operate. This trades write-time safety for read-time flexibility.
Key Points
- Schema evolution enables independent deployment of producers and consumers by defining rules for how event schemas may change over time.
- Backward, forward, and full compatibility modes each serve different upgrade ordering scenarios; full compatibility is the safest but most restrictive.
- Avro's writer/reader schema resolution and protobuf's field tagging both provide strong built-in evolution support, while JSON requires external enforcement.
- Schema registries act as a deploy-time gate, rejecting incompatible schema changes before they reach the event stream.
- Breaking changes (type changes, required field removal, field tag reuse) require explicit migration strategies such as topic versioning or dual-writing.
- In the Confluent Schema Registry wire format, a 5-byte prefix (1 magic byte + 4-byte schema ID) is embedded in event payloads, decoupling the data path from the registry and keeping runtime overhead minimal.
- Long-lived event streams should default to full compatibility mode where feasible. Backward-only or forward-only modes remain appropriate when upgrade ordering is tightly controlled and the trade-offs are understood.
References
Martin Kleppmann, "Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems," O'Reilly Media, 2017.
Jay Kreps, "The Log: What Every Software Engineer Should Know About Real-Time Data's Unifying Abstraction," LinkedIn Engineering Blog, 2013.
Apache Avro Specification, Version 1.11.0, "Schema Resolution," Apache Software Foundation, 2022.
Pat Helland, "Immutability Changes Everything," Communications of the ACM, Vol. 59, No. 1, 2016.
Confluent, "Schema Evolution and Compatibility," Confluent Documentation, 2023.
Google, "Protocol Buffers Language Guide (proto3)," developers.google.com, 2023.