COPY

The COPY command is the "high-speed rail" of data movement. While INSERT statements are fine for a few rows, COPY is designed to handle millions of records by moving data directly between a file (or a system stream) and a table.

The `COPY` Command

The COPY command is the most efficient way to load or export large datasets in PostgreSQL. It bypasses much of the overhead of standard SQL parsing.

Two Variants: `COPY` vs. `\copy`

It is vital to distinguish between these two, as they behave differently regarding file permissions and locations.

COPY (Server-side): The database process itself reads the file. The file must be on the database server's local disk, and the database user must have permission to read it.
\copy (Client-side): This is a psql meta-command. It reads a file from your local laptop/client and streams it to the database. No special server permissions are required.

Practical Syntax

Importing Data (Load)

When importing, you specify the source file, the format, and how to handle headers.

COPY users (id, name, email)
FROM '/data/users_export.csv'
WITH (FORMAT csv, HEADER true, DELIMITER ',');

Exporting Data (Extract)

You can export an entire table or the results of a specific query.

-- Exporting a specific query result to a file
COPY (SELECT id, name FROM users WHERE active = true)
TO '/tmp/active_users.csv'
WITH (FORMAT csv, HEADER true);

Advanced Features for Data Pipelines

1. The `PROGRAM` Clause

This allows you to pipe data directly through a shell command before it hits the table. This is incredibly useful for decompressing files on the fly.

-- Unzip and load in one command
COPY raw_logs FROM PROGRAM 'gunzip -c /data/logs.gz' WITH (FORMAT csv);

2. Handling Nulls and Encodings

You can define what a "Null" looks like in your text file to prevent import errors.

COPY staging_table FROM '/data/data.csv' 
WITH (FORMAT csv, NULL 'NULL_VALUE', ENCODING 'utf8');

3. Binary Format

If you are moving data between two Postgres databases, use FORMAT binary. It is significantly faster than CSV because it skips the text-to-binary conversion process.

`COPY` vs. `INSERT` (The DE Perspective)

Feature

INSERT

COPY

Speed

Slow (Individual transactions/parsing)

Ultra-Fast (Bulk stream)

Error Handling

Can skip 1 row and continue

Fails the entire batch on one error

Logging

High WAL overhead

Optimized WAL usage

Best For

Application-level single row updates

ETL/ELT bulk loading

Error Handling Strategy

Because COPY is "all-or-nothing," a single malformed row in a 10GB file will cause the whole command to fail.

The Pro Pattern:

COPY data into a Staging Table where all columns are TEXT.
Use SQL to clean/validate the data.
INSERT INTO ... SELECT CAST(...) from the staging table to the final production table.

Summary

Performance: Always prefer COPY for anything over 10,000 rows.
Security: COPY requires superuser or specific role permissions; \copy does not.
Flexibility: Use the PROGRAM option to handle compressed files (gzip) without extracting them to disk first.

PreviousEXCEPT and INTERSECT NextMath in SQL

Last updated 21 days ago

hashtagThe COPY Command

hashtagTwo Variants: COPY vs. \copy

hashtagPractical Syntax

hashtagImporting Data (Load)

hashtagExporting Data (Extract)

hashtagAdvanced Features for Data Pipelines

hashtag1. The PROGRAM Clause

hashtag2. Handling Nulls and Encodings

hashtag3. Binary Format

hashtagCOPY vs. INSERT (The DE Perspective)

hashtagError Handling Strategy

hashtagSummary