ADBC

ADBC (Arrow Database Connectivity) is the bridge that solves the "Row-to-Column" bottleneck. ADBC is a set of APIs and drivers designed to standardize how applications fetch Arrow data from any database.

If JDBC/ODBC are the legacy "standard-definition" cables, ADBC is the "4K fiber-optic" connection for data.

The Core Architecture

In the old world (ODBC/JDBC), the database had to transform its internal storage into rows, send them to you, and you had to transform them back into columns for analysis. ADBC allows the database (if it supports Arrow) to pass those columns directly to your memory.

Key Components:

Driver Manager: A thin library that loads specific database drivers (like the Postgres or Snowflake driver).
Drivers: Specific implementations that know how to talk to a particular database (Postgres, SQLite, Flight SQL, etc.).
The Object: Everything returns an ArrowArrayStream (or a RecordBatchReader in Python), which is the streaming concept we discussed.

Python Example: High-Speed Ingestion

One of the best use cases for ADBC is "Bulking." Moving data into a database is usually slow with INSERT statements. ADBC makes it a single binary operation.

import adbc_driver_postgresql.dbapi as pg_dbapi
import pyarrow as pa

# 1. Create some Arrow data (1 million rows)
data = pa.table({"id": range(1_000_000), "val": [1.5] * 1_000_000})

# 2. Connect via ADBC
uri = "postgresql://localhost:5432/postgres?user=admin&password=password"
with pg_dbapi.connect(uri) as conn:
    # 3. Ingest the table (ADBC handles the binary protocol)
    # This is often 10-20x faster than traditional SQLAlchemy/psycopg2 inserts
    conn.adbc_ingest("large_table", data, mode="create")

Python Example: Streaming Retrieval

Because ADBC is built for Streaming, you can fetch data from a database without loading the whole result set into RAM.

with pg_dbapi.connect(uri) as conn:
    with conn.cursor() as cur:
        cur.execute("SELECT * FROM large_table")
        
        # This returns an Arrow RecordBatchReader (A Stream)
        reader = cur.fetch_record_batch_reader()
        
        # Process one chunk at a time
        for batch in reader:
            print(f"Processing batch of {len(batch)} rows...")
            # Perform vectorized math on 'batch' here

Real-World Use Cases

A. The Data Migration Bridge (Snowflake to Postgres)

As we touched on earlier, ADBC is the best way to move data between two different databases. You use a Snowflake ADBC driver to fetch_arrow_table and a Postgres ADBC driver to adbc_ingest. The data never leaves the Arrow format.

B. Building High-Performance Data Apps

If you are building a dashboard (using Streamlit or Plotly) that needs to query 10 million rows from a database, ADBC allows the dashboard to load that data instantly. Users don't see a "loading" spinner for 30 seconds while the driver parses rows.

C. Moving Away from JDBC/ODBC

For years, Data Engineers had to install heavy Java environments just to use JDBC drivers. ADBC drivers are small, native C/C++ libraries (with Python/Go/Rust wrappers), making them much easier to deploy in Docker containers or Lambda functions.

ADBC vs. Flight SQL

This is where many people get stuck. Here is the simple rule:

Flight SQL is a network protocol (The "How" data travels over the wire).
ADBC is an interface (The "How" your code talks to a driver).

Crucially: You can use an ADBC Driver for Flight SQL. This means you write ADBC-style code, and the driver handles the Flight RPC communication under the hood.

# These all return Arrow data, but use different mechanisms:

# 1. ADBC + Postgres native protocol
import adbc_driver_postgresql.dbapi
pg_conn = adbc_driver_postgresql.dbapi.connect("postgresql://...")

# 2. ADBC + Snowflake native API
import adbc_driver_snowflake.dbapi
sf_conn = adbc_driver_snowflake.dbapi.connect("snowflake://...")

# 3. ADBC + Flight SQL protocol
import adbc_driver_flightsql.dbapi
fs_conn = adbc_driver_flightsql.dbapi.connect("grpc://...")

# 4. Direct Flight SQL (no ADBC layer)
from pyarrow.flight import FlightClient
client = FlightClient("grpc://...")

# All ultimately deliver Arrow data, but via different paths

Feature

ADBC

Flight SQL

Type

API Specification / Driver Manager

Network Protocol

Scope

Local or Remote

Remote (Network) only

Language Support

C, Go, Python, Rust

Any language with gRPC

Summary

ADBC is the "last mile" of interoperability. It ensures that the speed you gained by using Parquet and Arrow isn't lost the moment you need to talk to a SQL database.

PreviousThe interoperability chain of PyArrow components NextFlight RPC

Last updated 7 days ago

hashtagThe Core Architecture

hashtagKey Components:

hashtagPython Example: High-Speed Ingestion

hashtagPython Example: Streaming Retrieval

hashtagReal-World Use Cases

hashtagA. The Data Migration Bridge (Snowflake to Postgres)

hashtagB. Building High-Performance Data Apps

hashtagC. Moving Away from JDBC/ODBC

hashtagADBC vs. Flight SQL

hashtagSummary