RPC
Explanation from Martin Kleppmann
Understanding RPC: Remote Procedure Call
RPC (Remote Procedure Call) is a pattern or method of doing communication that allows a program to execute a function or procedure on another computer as if it were a local function call. It abstracts away all the networking complexity.
But many RPC frameworks have their own specific protocols (gRPC protocol, Thrift protocol, Kafka RPC, etc.).
Imagine you're writing code and you call a function:
result = calculate_sum(5, 10)With RPC, that calculate_sum() function could actually be running on a different computer across the network, but your code doesn't need to know or care. It feels like a normal function call.
Under the hood, RPC always involves:
TCP or HTTP connections
IP addressing
Ports
Serialization (Protobuf/JSON/etc.)
Network latency
Timeouts and retries
Connection failures
So RPC = network communication disguised as a function call.
Key RPC Components
Client - The program making the call
Stub (Client-side) - Converts function calls into network messages (marshaling)
Network - Transmits the request
Stub (Server-side) - Converts network messages back into function calls (unmarshaling)
Server - Executes the actual function and returns the result
RPC is NOT a Single Protocol
Here's where it gets interesting: RPC is a concept, not a specific protocol. There are many different RPC implementations:
Popular RPC Implementations
XML-RPC (1998) - Uses XML for encoding, HTTP for transport. Simple but verbose.
JSON-RPC - Uses JSON instead of XML. Lighter weight.
SOAP (Simple Object Access Protocol) - XML-based, very formal, used in enterprise. Heavy and complex.
gRPC (Google RPC, 2015) - Modern, uses Protocol Buffers, very fast. Built on HTTP/2.
Apache Thrift (Facebook) - Cross-language RPC framework
Java RMI (Remote Method Invocation) - Java-specific RPC
Microsoft RPC/DCOM - Windows-specific
gRPC: The Modern Standard
gRPC has become the most popular modern RPC implementation. Let me show you how it works:
Simple Python gRPC Example
Let me show you a basic gRPC example in Python:
Step 1: Define the Service (Protocol Buffer)
Step 2: Server Implementation
Step 3: Client Implementation
Notice: The client code calls stub.Add(...) like it's a local function, but it's actually executing on the server!
Simple JSON-RPC Example (Bash/curl)
JSON-RPC is simpler to demonstrate without special tools:
RPC vs REST: When to Use What?
🧠 How RPC compares to REST APIs
Transport
Usually HTTP
TCP, HTTP/2, custom protocols
Format
JSON mostly
Protobuf, Avro, Thrift
Style
Resource-oriented
Function-oriented
Use cases
External APIs
High-performance internal services
Speed
Slower
Faster
RPC is used when you want faster, structured, function-like interactions between services.
Use REST API When:
Building public APIs for web/mobile apps
You need browser compatibility
Human readability is important (debugging, testing)
You're exposing resources (users, products, etc.)
Third-party integration is needed
Use RPC (gRPC) When:
Building internal microservices
Performance is critical
You need real-time bidirectional streaming
Strong typing and contracts are important
Your system is service-oriented, not resource-oriented
You're building polyglot systems (multiple languages)
Where RPC is Used Today
Microservices Architecture - Netflix, Uber, Google use gRPC for inter-service communication
Cloud Services - Google Cloud, AWS use RPC internally
Distributed Systems - Kubernetes API server uses gRPC
Real-time Applications - Gaming servers, chat applications
IoT Systems - Device-to-server communication
Financial Systems - High-frequency trading, payment processing
RPC Advantages
Performance - Binary encoding is faster than JSON
Type Safety - Strongly typed contracts prevent errors
Code Generation - Automatically generates client/server code
Streaming - Supports bidirectional streaming (not just request/response)
Multi-language - Same service definition works across languages
RPC Disadvantages
Complexity - More setup than simple REST
Debugging - Binary data is harder to inspect
Browser Support - Limited (though gRPC-Web exists)
Firewall Issues - Some firewalls block non-HTTP protocols
Tight Coupling - Client and server are more tightly coupled
Modern RPC Landscape
The RPC world is very active in 2025:
gRPC dominates for microservices and internal APIs (70%+ of new projects)
REST still leads for public web APIs and mobile backends
GraphQL (a query language, not RPC) is popular for flexible data fetching
WebSockets used for real-time bidirectional communication
Newer alternatives like tRPC (TypeScript), Connect (simpler gRPC)
Key Takeaways
RPC is a concept that lets you call functions on remote computers as if they were local.
Many implementations exist, with gRPC being the modern standard for high-performance internal services.
It's different from REST - RPC is action/function-oriented while REST is resource-oriented.
Choose based on needs - REST for public APIs, gRPC for internal microservices and performance-critical applications.
RPC has been around since the 1980s, but it's more relevant than ever thanks to modern implementations like gRPC powering today's distributed systems and microservices architectures!
🔌 Where RPC is used in data engineering
RPC-like communication happens in almost all distributed data systems:
Kafka
Brokers use RPC internally for cluster metadata sync.
Producers/consumers talk to brokers using a binary RPC protocol.
Spark
Driver talks to executors via RPC.
Executors shuffle data using network RPC calls.
Flink / Hive / Presto / Trino
All rely on internal RPC for coordination.
Cloud platforms
AWS uses RPC under the hood for most internal APIs (just wrapped in HTTP).
Microservices
gRPC, Thrift, and Avro RPC are common.
Last updated