# About Arrow IPC/Feather format

***

Apache Feather

Arrow is the "bridge" between the storage world (Parquet/ORC) and the compute world (CPU/RAM).

**Feather** (specifically Feather V2) is essentially the **Arrow IPC**(IPC stands for Inter-Process Communication) **format saved to a file**. When you save a `RecordBatch` to disk as a Feather file, you are taking the exact byte-for-byte memory layout of Arrow and dumping it into a file with a small footer.

The "magic" of Arrow IPC is that it is **identical** to the Arrow in-memory format.

***

#### Why use Feather instead of Parquet?

If Parquet is a "**Suitcase**" (packed tight for travel), Feather is a "**Wardrobe**" (everything is already on hangers).

| **Feature**     | **Parquet**                             | **Feather (Arrow IPC)**                     |
| --------------- | --------------------------------------- | ------------------------------------------- |
| **Speed**       | Fast (but needs decompression/decoding) | **Blazing** (Zero-copy / Memory-mapping)    |
| **CPU Usage**   | High (Decompressing data)               | **Near Zero** (Data is ready to use)        |
| **File Size**   | Small (Highly compressed)               | Larger (Uncompressed or lightly compressed) |
| **Portability** | Universal (Spark, Presto, Snowflake)    | Optimized for Python/R/Go/C++               |

#### The "Magic" of Memory Mapping

The coolest thing about Feather is **Memory Mapping** (`mmap`).

When you "read" a Feather file, the operating system doesn't actually load the file into your RAM. Instead, it creates a mapping. Your program thinks the data is in memory, and the OS only pulls the specific bytes from the disk at the exact moment your CPU asks for them.

```python
import pyarrow.feather as feather

# Writing is straightforward
feather.write_feather(table, 'data.feather')

# Reading with memory mapping
# This is nearly instant, even for a 10GB file
read_table = feather.read_table('data.feather', memory_map=True)
```

***

#### When to use which?

* Use **Parquet for long-term storage**, data lakes (S3/GCS), and sharing data with other teams or tools like Spark and Hive. It saves you money on storage costs.
* Use **Feather/IPC for short-term "warm" data**, passing data between microservices, or local caching. If you have a Python script that processes data and a Go service that needs to read it immediately after, Feather/IPC is the king of speed.

#### A Quick Note on "V1" vs "V2"

You might see old tutorials talking about Feather being limited. **That was V1**.

**Feather V2** (the current version) is fully based on the Apache Arrow IPC format. It supports all Arrow data types, including those **Struct Arrays**, and even supports compression (LZ4 or ZSTD) if you want a middle ground between speed and size.

***

#### Where does this fit in "Streaming" knowledge?

Since Feather is just the **IPC format** in a file, you can actually convert a stream directly into a Feather file. They are two sides of the same coin:

* **IPC Stream:** For sending batches over a network/pipe.
* **Feather File:** For "freezing" those same batches onto a disk.

***
