Flight RPC
Flight RPC
If IPC is the "shared language" and ADBC is the "database connector," then Arrow Flight RPC is the high-speed transport system for moving data across a network.
Traditional networking protocols like REST (JSON over HTTP) or standard gRPC are designed for small messages. When you try to send 10 million rows of data through them, they choke because they have to "row-ify" the data and serialize it. Flight keeps the data in its columnar Arrow format from the server's RAM all the way to the client's RAM.
The Core Philosophy: No More Serialization
In a typical API call, the server spends 80% of its time turning data into JSON, and the client spends 80% of its time turning JSON back into data.
With Flight, the server literally sends the Arrow RecordBatches as a raw stream of bytes. The client receives those bytes and maps them directly into its own memory. This results in 10x to 100x faster data transfer.
The Flight "Vocabulary"
Flight isn't just a blind stream; it's a protocol with specific commands:
list_flights(): "What datasets do you have available?"get_flight_info(): "I want the 'taxi_data' dataset. How is it partitioned and where are the endpoints?"do_get(): The most common command. "Start streaming the data to me now."do_put(): "I'm sending you a stream of data to save."do_action(): For custom commands like "Clean the cache" or "Drop this table."
Parallelism: The "Horizontal Scaling" Secret
This is where Flight leaves other protocols in the dust. A single get_flight_info() call can return multiple Endpoints.
If your data is stored across three different servers, the Flight client can open three simultaneous connections and stream different parts of the data at the same time. This means your transfer speed is limited by your network bandwidth, not by a single CPU core.
Simple Python Example (The Client)
Here is how a client requests data from a Flight server.
Where does Flight SQL fit in?
You will often hear about Flight SQL. This is a specific "dialect" of Flight that allows you to send standard SQL queries (SELECT * FROM...) over the Flight protocol.
It is essentially the high-performance alternative to JDBC/ODBC. Instead of a slow database driver, you use a Flight SQL driver to get Arrow batches directly from databases like Dremio, InfluxDB, or DuckDB (via a Flight wrapper).
Think of it as JDBC/ODBC on steroids. Instead of receiving rows, you receive RecordBatches.
The Flight SQL Workflow
The interaction follows a specific handshake to ensure performance:
Command: The client sends a
CommandStatementQuery(your SQL string).Info: The server returns a
FlightInfoobject containing the schema and "Tickets" (endpoints).Fetch: The client uses the tickets to pull the actual
RecordBatchesin parallel.
Python Client Example
To run this, you need the pyarrow library with the flight components installed.
The Interoperability Connection (Full Circle)
Flight is the reason your Golang service and Python worker can be on different servers and still act like they are sharing the same memory.
Python cleans data -> converts to RecordBatches.
Flight Server (Python) sends those batches over the wire.
Flight Client (Go) receives the batches and uses Zero-Copy to process them.
Summary
Use REST/JSON for small, simple web metadata.
Use Arrow Flight for moving dataframes, tables, or bulk analytical data between services.
Last updated