COPY


The COPY command is the "high-speed rail" of data movement. While INSERT statements are fine for a few rows, COPY is designed to handle millions of records by moving data directly between a file (or a system stream) and a table.


The COPY Command

The COPY command is the most efficient way to load or export large datasets in PostgreSQL. It bypasses much of the overhead of standard SQL parsing.

Two Variants: COPY vs. \copy

It is vital to distinguish between these two, as they behave differently regarding file permissions and locations.

  • COPY (Server-side): The database process itself reads the file. The file must be on the database server's local disk, and the database user must have permission to read it.

  • \copy (Client-side): This is a psql meta-command. It reads a file from your local laptop/client and streams it to the database. No special server permissions are required.


Practical Syntax

Importing Data (Load)

When importing, you specify the source file, the format, and how to handle headers.

COPY users (id, name, email)
FROM '/data/users_export.csv'
WITH (FORMAT csv, HEADER true, DELIMITER ',');

Exporting Data (Extract)

You can export an entire table or the results of a specific query.


Advanced Features for Data Pipelines

1. The PROGRAM Clause

This allows you to pipe data directly through a shell command before it hits the table. This is incredibly useful for decompressing files on the fly.

2. Handling Nulls and Encodings

You can define what a "Null" looks like in your text file to prevent import errors.

3. Binary Format

If you are moving data between two Postgres databases, use FORMAT binary. It is significantly faster than CSV because it skips the text-to-binary conversion process.


COPY vs. INSERT (The DE Perspective)

Feature

INSERT

COPY

Speed

Slow (Individual transactions/parsing)

Ultra-Fast (Bulk stream)

Error Handling

Can skip 1 row and continue

Fails the entire batch on one error

Logging

High WAL overhead

Optimized WAL usage

Best For

Application-level single row updates

ETL/ELT bulk loading


Error Handling Strategy

Because COPY is "all-or-nothing," a single malformed row in a 10GB file will cause the whole command to fail.

The Pro Pattern:

  1. COPY data into a Staging Table where all columns are TEXT.

  2. Use SQL to clean/validate the data.

  3. INSERT INTO ... SELECT CAST(...) from the staging table to the final production table.


Summary

  • Performance: Always prefer COPY for anything over 10,000 rows.

  • Security: COPY requires superuser or specific role permissions; \copy does not.

  • Flexibility: Use the PROGRAM option to handle compressed files (gzip) without extracting them to disk first.


Last updated