MERGE


In the world of Data Engineering, the MERGE statement is often called the "Upsert" command. It allows you to perform INSERTs, UPDATEs, and DELETEs in a single, atomic operation based on whether a record already exists in the target table.

While PostgreSQL historically used INSERT ... ON CONFLICT for this, it officially added the SQL-standard MERGE command in PostgreSQL 15.


The Anatomy of a MERGE Statement

A MERGE statement compares a Source (new data) with a Target (existing table) and takes action based on a join condition.

MERGE INTO target_table AS t
USING source_table AS s
ON t.id = s.id
WHEN MATCHED THEN
    UPDATE SET name = s.name, updated_at = NOW()
WHEN NOT MATCHED THEN
    INSERT (id, name, created_at) VALUES (s.id, s.name, NOW());

The Three Main Clauses

The power of MERGE lies in its conditional logic:

  • WHEN MATCHED [AND condition]: If the keys match, you typically run an UPDATE. You can add an extra AND to only update if the data has actually changed (to avoid unnecessary disk writes).

  • WHEN NOT MATCHED [BY TARGET]: If the key exists in the source but not the target, you run an INSERT.

  • WHEN NOT MATCHED BY SOURCE: (Supported in some SQL flavors like SQL Server, used sparingly in Postgres via logic) If the key exists in the target but not the source, you might run a DELETE or mark the record as "Inactive."


MERGE vs. INSERT ... ON CONFLICT

Since Postgres has two ways to do "Upserts," you need to know which to choose for your ETL pipelines.

Feature

MERGE (Postgres 15+)

INSERT ... ON CONFLICT

Standard

SQL Standard (Portable).

Postgres-specific.

Flexibility

Can do UPDATE, INSERT, and DELETE.

Limited to UPDATE or NOTHING.

Join Logic

Can join on any complex condition.

Requires a Unique Constraint or Index.

Performance

Great for batch processing/ETL.

Faster for high-concurrency single-row upserts.


Implementation in Data Pipelines (The "SCD Type 1" Pattern)

In Data Warehousing, we use MERGE to maintain Slowly Changing Dimensions (SCD) Type 1, where we overwrite old data with new data.


Visualizing the MERGE Flow

spinner

Summary

  • Atomic: The entire MERGE happens in one transaction. If one part fails, the whole thing rolls back.

  • Efficiency: It reduces "Network Round-trips." Instead of checking if a row exists and then sending a second command, you send everything in one go.

  • Conditionality: Use the AND sub-clause in WHEN MATCHED to prevent "No-op updates" (updating a row with the exact same data), which saves on VACUUM overhead.


Last updated