# The interoperability chain of PyArrow components

***

#### The Interoperability Chain

Here is how PyArrow concepts flow together in a real-world pipeline:

1. **Storage (Parquet/Cloud):** You have a massive dataset of Struct Arrays stored in Parquet on S3.
2. **Discovery (Dataset API):** You use the Dataset API to scan those files. It uses Partition Pruning to only look at the folders you need.
3. **Processing (Streaming)**: You don't load the whole thing. You Stream the data as a sequence of RecordBatches.
4. **Hand-off (IPC):** You send those batches from your Python worker to a Golang service using the IPC Stream format.
5. **Zero-Copy:** The Go service receives the bytes and, because it understands the Arrow IPC protocol, it accesses the data with Zero-Copy. It doesn't "import" the data; it just starts reading the memory.

#### Where does ADBC fit in the chain?

If Arrow is the **shared language**, and IPC is the **telephone line**, then **ADBC (Arrow Database Connectivity)** is the **standardized connector** to databases.

Before ADBC, if you wanted to get data out of Postgres or Snowflake into Arrow, you had to use **ODBC** or **JDBC**. Those protocols are "row-based," so the database had to convert its data into rows, send it to you, and then you had to re-package it back into Arrow columns. That's a massive waste of CPU.

**ADBC allows the database to stream Arrow RecordBatches directly to your application.**

**The updated chain:**

1. **Source:** Snowflake/Postgres (via **ADBC**).
2. **Transfer:** Data arrives in your Python/Go app already as **RecordBatches**.
3. **Processing:** You use the **Dataset API** or **Compute functions**.
4. **Storage:** You save the result to **S3 (Parquet)** or **Local (Feather)**.

***

***
