What does zero-copy mean?
At its core, Zero-Copy is a technique to maximize system performance by minimizing the number of times data is copied between memory buffers.
In a standard computing environment, the CPU spends a surprising amount of time just moving data from "Pile A" to "Pile B" without actually processing it. Zero-copy aims to eliminate these redundant data movements, freeing up the CPU for other tasks and reducing memory bus contention.
Here is the breakdown of the technology, independent of any specific library.
The Core Problem: User Space vs. Kernel Space
To understand zero-copy, you must understand the boundary between your application and the Operating System (OS).
User Space: Where your application (Java, Python, C++ code) runs. It has restricted access to hardware.
Kernel Space: Where the OS kernel runs. It has full access to hardware (Disk, Network, etc.).
The Traditional Approach (The "Slow" Way)
When a standard web server wants to serve a static file (index.html) to a user, it typically performs two system calls: read() (disk to app) and write() (app to network).
This seemingly simple action actually forces 4 context switches and 4 separate data copies:
DMA (stands for Direct Memory Access) Copy: Disk engine copies data → Kernel Read Buffer.
CPU Copy: CPU copies data from Kernel Buffer → Application (User) Buffer.
CPU Copy: CPU copies data from Application Buffer → Kernel Socket Buffer.
DMA Copy: Network engine copies data → Network Card (NIC).
Note: DMA is a specialized hardware module (chip or feature) that allows peripherals (like your Hard Drive, Network Card, or GPU) to read and write to main memory (RAM) without bothering the CPU for every single byte.
The CPU is heavily involved in steps 2 and 3, even though the application didn't change the data at all.
That specific hop—where the CPU has to "touch" the data just to move it from one memory address to another—is precisely the waste that Zero-Copy (and by extension, Arrow/Polars) eliminates.
Visualization: Traditional vs. Zero-Copy
Here is a comparison using Mermaid diagrams to visualize the data path.
Standard I/O (Read + Write)
Notice how the data (orange arrows) has to travel up into User Space and back down.
Zero-Copy (e.g., sendfile)
In a zero-copy operation, the application tells the Kernel: "Take the data from this file descriptor and send it to that socket descriptor. Do not bother sending it to me."
Socket Buffer gets Descriptors: Instead of copying the actual data (the heavy payload) into the Socket Buffer, the CPU only writes descriptors (tiny pointers).
The data never leaves Kernel Space.
Key Technologies & System Calls
Zero-copy is implemented using specific system calls provided by the OS (primarily Linux/Unix).
A. sendfile()
This is the most famous zero-copy syscall. It transfers data directly between two file descriptors (e.g., a file on disk and a network socket) entirely within the kernel.
Usage: Static file servers, video streaming.
Benefit: Eliminates the User Space copies entirely.
B. mmap() (Memory Mapping)
This maps a file descriptor into the application's address space. The application reads the file as if it were a byte array in memory.
Mechanism: When the app reads the memory, the OS loads the file content into the Kernel Page Cache. The "Zero-Copy" aspect here is that the OS doesn't copy from the Kernel Cache to a separate User Buffer; it just maps the User's virtual address to the Kernel's physical address.
Usage: Database engines (like MongoDB or initial versions of Kafka), high-performance processing.
C. splice()
This allows moving data between two file descriptors (like sendfile), but it works for pipes as well as sockets, allowing for more complex chaining of data streams without user-space copying.
The Role of Hardware (DMA)
True zero-copy relies heavily on Direct Memory Access (DMA).
DMA is a hardware feature that allows peripherals (like Disk Drives and Network Cards) to access system memory (RAM) independently of the Main CPU.
Without DMA: The CPU has to manually read a byte from the disk and write it to RAM.
With DMA: The CPU tells the Disk Controller: "Transfer 10MB to this RAM address and wake me up when you're done." The CPU then goes to do other work.
Real-World Example: Apache Kafka
Apache Kafka is the classic industry example of zero-copy efficiency.
Scenario: A Consumer wants to read messages from a topic.
Action: Kafka uses the
sendfilesystem call.Result: Data flows from the Disk → OS Cache → Network Card.
The Java JVM (where Kafka runs) barely touches the data. It manages the pointers and the logic, but the heavy lifting of moving gigabytes of logs is done by the Linux Kernel and hardware DMA. This is why Kafka can saturate a 10Gbps network connection with very low CPU usage.
Key Takeaways
To summarize, Zero-Copy technology is about:
Avoiding Context Switches: Keeping the data in Kernel Space as much as possible.
Bypassing the CPU: Letting specialized hardware (DMA) move the bytes.
Sharing Memory: Mapping virtual addresses to the same physical RAM (as seen in
mmap).
Last updated