gRPC in Distributed Systems

Introduction

Remote procedure calls have shaped distributed systems design since Birrell and Nelson formalized the concept in 1984.
The core idea is deceptively simple: make a call to a remote service look like a local function invocation.
In practice, the abstraction leaks in every direction.
Network failures, serialization overhead, versioning constraints, and latency all intrude on the illusion. gRPC, released by Google in 2015 and built on HTTP/2, represents the current state of the art in RPC frameworks.
It addresses many historical pain points while introducing its own set of tradeoffs that engineers must understand to use it effectively.

Architecture and Protocol Design

gRPC is built on three foundational choices: Protocol Buffers (protobuf) for interface definition and serialization, HTTP/2 as the transport protocol, and code generation to produce typed client and server stubs across languages.

Protocol Buffers

Protobuf serves a dual role.
First, it is an Interface Definition Language (IDL) that specifies service contracts.
Second, it is a binary serialization format.
Unlike JSON or XML, protobuf uses a compact binary encoding with field tags instead of field names, yielding significantly smaller payloads.
Fields are identified by integer tags, which enables forward and backward compatibility: unknown fields are preserved rather than rejected, and missing fields assume default values.

A service definition looks like this:

syntax = "proto3";

service OrderService {
  rpc CreateOrder(CreateOrderRequest) returns (CreateOrderResponse);
  rpc StreamUpdates(OrderQuery) returns (stream OrderUpdate);
}

message CreateOrderRequest {
  string customer_id = 1;
  repeated LineItem items = 2;
}

The protobuf compiler (protoc) generates client stubs and server interfaces in the target language.
This eliminates an entire class of bugs related to hand-written serialization and deserialization.

HTTP/2 Transport

diagram-2 — Multiplexed HTTP/2 streams over a single TCP connection

The choice of HTTP/2 is not incidental.
HTTP/1.1 suffers from head-of-line blocking at the application layer, requires a new TCP connection (or connection reuse with pipelining limitations) for concurrent requests, and transmits headers as uncompressed text.
HTTP/2 solves these problems with multiplexed streams over a single TCP connection, binary framing, header compression via HPACK, and flow control at both the connection and stream level.

For distributed systems, multiplexing is the most consequential feature.
A single gRPC channel between two services can carry hundreds of concurrent RPCs without the overhead of connection pooling that HTTP/1.1-based systems require.
Each RPC maps to an HTTP/2 stream, and streams are independent, so a slow response on one RPC does not block others at the application layer (though TCP-level head-of-line blocking remains a concern, which is why gRPC over QUIC is under active development).

Four Communication Patterns

gRPC defines four RPC types that map to different distributed systems interaction patterns:

Unary RPC: Single request, single response. The classic RPC model.
Server streaming: Client sends one request, server returns a stream of responses. Useful for watch/subscribe patterns or large result sets.
Client streaming: Client sends a stream of messages, server responds once. Useful for upload or aggregation scenarios.
Bidirectional streaming: Both sides send streams of messages independently. Enables real-time, full-duplex communication.

The streaming modes are what distinguish gRPC from simpler RPC frameworks.
They allow designers to model long-lived interactions (event feeds, telemetry pipelines, consensus protocol message exchanges) without resorting to polling or separate pub/sub infrastructure.

Walkthrough

diagram-1 — Unary gRPC call lifecycle across stub, HTTP/2 stream, and server

The lifecycle of a unary gRPC call involves several stages that are worth understanding in detail, as each stage is a potential source of failure or latency.

1. Client calls generated stub method with request object.
2. Stub serializes request using protobuf binary encoding.
3. gRPC framework constructs HTTP/2 request:
   a. Opens a new stream on the existing HTTP/2 connection.
   b. Sends HEADERS frame with:
      - :method = POST
      - :path = /package.ServiceName/MethodName
      - content-type = application/grpc
      - grpc-timeout = <deadline>
   c. Sends DATA frame(s) containing:
      - 1 byte: compression flag
      - 4 bytes: message length (big-endian)
      - N bytes: serialized protobuf message
4. Server receives stream, deserializes request, invokes handler.
5. Server handler returns response object.
6. Server serializes response, sends DATA frame(s) in same format.
7. Server sends trailing HEADERS frame with:
   - grpc-status = <status code>
   - grpc-message = <optional error detail>
8. Client receives response, deserializes, returns to caller.
9. If grpc-status != 0, client raises an error with the status code.

The 5-byte length-prefixed framing (step 3c) is a critical detail.
It allows gRPC to delineate messages within the HTTP/2 data stream, since HTTP/2 DATA frames do not necessarily align with application-level message boundaries.

Deadlines, Cancellation, and Failure Semantics

gRPC's approach to failure handling is one of its most important design decisions for distributed systems.

Deadline Propagation

Every gRPC call carries a deadline (not a timeout).
This distinction matters in deep call chains.
If Service A calls Service B with a 500ms deadline and Service B calls Service C, Service C inherits the remaining deadline (say, 350ms after network and processing delays).
This prevents cascading delays where downstream services waste work on requests that the upstream caller has already abandoned.
Deadline propagation is automatic when metadata is forwarded correctly.

Cancellation

When a client cancels an RPC or a deadline expires, the cancellation propagates through the call chain.
The server receives a cancellation signal (surfaced via context cancellation in Go, ServerCallContext in C#, etc.), and well-written handlers check for cancellation and abort early.
This is essential for resource efficiency: without propagated cancellation, servers waste CPU and memory completing work that nobody will consume.

Status Codes

gRPC defines 16 status codes (e.g., OK, CANCELLED, DEADLINE_EXCEEDED, UNAVAILABLE, RESOURCE_EXHAUSTED).
These are more semantically precise than HTTP status codes for RPC scenarios.
For instance, UNAVAILABLE explicitly signals that the request should be retried, while FAILED_PRECONDITION indicates a retry will not help without changing system state.
Proper use of these codes enables intelligent retry policies in client libraries and service meshes.

Load Balancing Considerations

diagram-3 — Client-side, L7 proxy, and lookaside xDS load balancing topologies

gRPC's use of long-lived HTTP/2 connections creates a subtle but important load balancing challenge.
Traditional L4 (TCP-level) load balancers distribute connections, not requests.
Since a gRPC client maintains a single connection to a backend (or a small pool), all RPCs from that client flow through one backend, creating hot spots.

There are three common solutions:

Client-side load balancing: The client resolves multiple backend addresses (via DNS, a service registry, or xDS) and distributes RPCs across connections to different backends. gRPC has built-in support for pluggable name resolution and load balancing policies (round-robin, weighted, pick-first).
L7 proxy-based load balancing: A proxy (Envoy, for example) terminates the HTTP/2 connection and distributes individual RPCs to backends. This is transparent to the client but adds a network hop.
Lookaside load balancing (xDS): The client contacts a control plane to receive backend addresses and load balancing configuration. This is the model used in service mesh architectures and is the basis of gRPC's xDS integration.

Interceptors and Observability

gRPC provides an interceptor (or middleware) pattern on both client and server sides.
Interceptors are chained functions that wrap each RPC, enabling cross-cutting concerns without modifying business logic.
Common uses include:

Distributed tracing: Injecting and extracting trace context (e.g., W3C Trace Context headers) in metadata.
Metrics: Recording latency histograms, error rates, and throughput per method.
Authentication: Validating tokens or certificates before the handler executes.
Retries and hedging: The gRPC client library supports configurable retry policies (with exponential backoff) and hedged requests, specified via service configuration.

Because protobuf services are strongly typed and each RPC has a well-defined method path (/package.Service/Method), telemetry is naturally structured.
This is a significant advantage over REST-based systems where URL patterns must be normalized for meaningful aggregation.

Tradeoffs and Limitations

gRPC is not universally superior.
Several tradeoffs deserve candid assessment:

Browser support: Native gRPC requires HTTP/2 with trailers, which browsers do not fully expose. gRPC-Web exists as a workaround, but it requires a proxy and does not support client or bidirectional streaming.
Human readability: Binary protobuf payloads are opaque without tooling. Debugging with curl or a browser is not practical. Tools like grpcurl and grpc-web-devtools partially address this.
Schema evolution discipline: While protobuf is designed for compatibility, it requires discipline. Renaming fields, changing types, or reusing deleted field numbers, can break clients. This is a process problem, not a technical one, but it catches teams off guard.
TCP head-of-line blocking: HTTP/2 multiplexing eliminates application-level head-of-line blocking, but packet loss on the underlying TCP connection stalls all streams. This is the motivation for exploring gRPC over QUIC (HTTP/3), where streams are independent at the transport layer.

Key Points

gRPC combines protobuf IDL/serialization, HTTP/2 transport, and code generation to provide a strongly-typed, high-performance RPC framework.
HTTP/2 multiplexing allows many concurrent RPCs over a single connection, but this creates load balancing challenges that require L7-aware or client-side solutions.
Deadline propagation (not just timeouts) and automatic cancellation across call chains prevent wasted work and cascading failures in deep service topologies.
Four communication patterns (unary, server streaming, client streaming, bidirectional streaming) cover a wide range of distributed interaction models without external messaging infrastructure.
The interceptor pattern provides a clean extension point for distributed tracing, authentication, retries, and metrics without coupling these concerns to business logic.
Protobuf's binary encoding and field-tag-based schema evolution enable compact payloads and forward/backward compatibility, at the cost of human readability.
gRPC's semantic status codes enable more precise error handling and retry logic than HTTP status codes alone.

References

Birrell, A.D. and Nelson, B.J. "Implementing Remote Procedure Calls." ACM Transactions on Computer Systems, 2(1), 1984.

Belshe, M., Peon, R., and Thomson, M. "Hypertext Transfer Protocol Version 2 (HTTP/2)." RFC 7540, IETF, 2015.

Google. "gRPC: A High Performance, Open Source Universal RPC Framework." https://grpc.io/docs/, 2015-present.

Varda, K. "Protocol Buffers: Google's Data Interchange Format." Google Technical Documentation, 2008.

Burns, B. "Designing Distributed Systems: Patterns and Paradigms for Scalable, Reliable Services." O'Reilly Media, 2018.