RPC


Explanationarrow-up-right from Martin Kleppmann


Understanding RPC: Remote Procedure Call

RPC (Remote Procedure Call) is a pattern or method of doing communication that allows a program to execute a function or procedure on another computer as if it were a local function call. It abstracts away all the networking complexity.

But many RPC frameworks have their own specific protocols (gRPC protocol, Thrift protocol, Kafka RPC, etc.).

Imagine you're writing code and you call a function:

result = calculate_sum(5, 10)

With RPC, that calculate_sum() function could actually be running on a different computer across the network, but your code doesn't need to know or care. It feels like a normal function call.

Under the hood, RPC always involves:

  • TCP or HTTP connections

  • IP addressing

  • Ports

  • Serialization (Protobuf/JSON/etc.)

  • Network latency

  • Timeouts and retries

  • Connection failures

So RPC = network communication disguised as a function call.

Key RPC Components

Client - The program making the call

Stub (Client-side) - Converts function calls into network messages (marshaling)

Network - Transmits the request

Stub (Server-side) - Converts network messages back into function calls (unmarshaling)

Server - Executes the actual function and returns the result


RPC is NOT a Single Protocol

Here's where it gets interesting: RPC is a concept, not a specific protocol. There are many different RPC implementations:

XML-RPC (1998) - Uses XML for encoding, HTTP for transport. Simple but verbose.

JSON-RPC - Uses JSON instead of XML. Lighter weight.

SOAP (Simple Object Access Protocol) - XML-based, very formal, used in enterprise. Heavy and complex.

gRPC (Google RPC, 2015) - Modern, uses Protocol Buffers, very fast. Built on HTTP/2.

Apache Thrift (Facebook) - Cross-language RPC framework

Java RMI (Remote Method Invocation) - Java-specific RPC

Microsoft RPC/DCOM - Windows-specific


gRPC: The Modern Standard

gRPC has become the most popular modern RPC implementation. Let me show you how it works:

Simple Python gRPC Example

Let me show you a basic gRPC example in Python:

Step 1: Define the Service (Protocol Buffer)

Step 2: Server Implementation

Step 3: Client Implementation

Notice: The client code calls stub.Add(...) like it's a local function, but it's actually executing on the server!

Simple JSON-RPC Example (Bash/curl)

JSON-RPC is simpler to demonstrate without special tools:


RPC vs REST: When to Use What?

🧠 How RPC compares to REST APIs

Topic
REST API
RPC

Transport

Usually HTTP

TCP, HTTP/2, custom protocols

Format

JSON mostly

Protobuf, Avro, Thrift

Style

Resource-oriented

Function-oriented

Use cases

External APIs

High-performance internal services

Speed

Slower

Faster

RPC is used when you want faster, structured, function-like interactions between services.

Use REST API When:

  • Building public APIs for web/mobile apps

  • You need browser compatibility

  • Human readability is important (debugging, testing)

  • You're exposing resources (users, products, etc.)

  • Third-party integration is needed

Use RPC (gRPC) When:

  • Building internal microservices

  • Performance is critical

  • You need real-time bidirectional streaming

  • Strong typing and contracts are important

  • Your system is service-oriented, not resource-oriented

  • You're building polyglot systems (multiple languages)


Where RPC is Used Today

Microservices Architecture - Netflix, Uber, Google use gRPC for inter-service communication

Cloud Services - Google Cloud, AWS use RPC internally

Distributed Systems - Kubernetes API server uses gRPC

Real-time Applications - Gaming servers, chat applications

IoT Systems - Device-to-server communication

Financial Systems - High-frequency trading, payment processing


RPC Advantages

Performance - Binary encoding is faster than JSON

Type Safety - Strongly typed contracts prevent errors

Code Generation - Automatically generates client/server code

Streaming - Supports bidirectional streaming (not just request/response)

Multi-language - Same service definition works across languages


RPC Disadvantages

Complexity - More setup than simple REST

Debugging - Binary data is harder to inspect

Browser Support - Limited (though gRPC-Web exists)

Firewall Issues - Some firewalls block non-HTTP protocols

Tight Coupling - Client and server are more tightly coupled


Modern RPC Landscape

The RPC world is very active in 2025:

gRPC dominates for microservices and internal APIs (70%+ of new projects)

REST still leads for public web APIs and mobile backends

GraphQL (a query language, not RPC) is popular for flexible data fetching

WebSockets used for real-time bidirectional communication

Newer alternatives like tRPC (TypeScript), Connect (simpler gRPC)


Key Takeaways

RPC is a concept that lets you call functions on remote computers as if they were local.

Many implementations exist, with gRPC being the modern standard for high-performance internal services.

It's different from REST - RPC is action/function-oriented while REST is resource-oriented.

Choose based on needs - REST for public APIs, gRPC for internal microservices and performance-critical applications.

RPC has been around since the 1980s, but it's more relevant than ever thanks to modern implementations like gRPC powering today's distributed systems and microservices architectures!


🔌 Where RPC is used in data engineering

RPC-like communication happens in almost all distributed data systems:

Kafka

  • Brokers use RPC internally for cluster metadata sync.

  • Producers/consumers talk to brokers using a binary RPC protocol.

Spark

  • Driver talks to executors via RPC.

  • Executors shuffle data using network RPC calls.

Flink / Hive / Presto / Trino

All rely on internal RPC for coordination.

Cloud platforms

  • AWS uses RPC under the hood for most internal APIs (just wrapped in HTTP).

Microservices

gRPC, Thrift, and Avro RPC are common.


Last updated