Triggers


In PostgreSQL, a Trigger is a specialized function that the database executes automatically whenever a specific event occurs—typically an INSERT, UPDATE, DELETE, or TRUNCATE.

As a Data Engineer, you use triggers to enforce complex business rules, maintain audit logs, or synchronize data between tables in real-time without relying on application-level code.


The Two-Step Creation

Unlike other database objects, a trigger in Postgres requires two distinct pieces:

  1. The Trigger Function: A special function (written in PL/pgSQL) that returns a type of TRIGGER.

  2. The Trigger Definition: The actual "tripwire" that tells Postgres which table and event should fire the function.

-- 1. Create the function
CREATE OR REPLACE FUNCTION log_price_changes()
RETURNS TRIGGER AS $$
BEGIN
    IF NEW.price <> OLD.price THEN
        INSERT INTO price_history(product_id, old_price, new_price, changed_at)
        VALUES (OLD.id, OLD.price, NEW.price, NOW());
    END IF;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

-- 2. Create the trigger
CREATE TRIGGER trg_price_update
AFTER UPDATE ON products
FOR EACH ROW
EXECUTE FUNCTION log_price_changes();

Trigger Timing: BEFORE vs. AFTER

The timing of a trigger is crucial for its function:

  • BEFORE: Executes before the data is saved to the disk.

    • Use case: Data validation or auto-formatting (e.g., forcing a username to lowercase before it's saved). You can modify the NEW record here.

  • AFTER: Executes after the data is saved.

    • Use case: Auditing, logging, or updating other tables. You cannot modify the current record here because it's already "done."

  • INSTEAD OF: Used primarily with Views. It tells Postgres, "Don't try to update the view; run this logic instead" (e.g., updating the underlying base tables).


Row-Level vs. Statement-Level

  • FOR EACH ROW: The trigger fires once for every single row affected. If you update 1,000 rows, the trigger runs 1,000 times.

  • FOR EACH STATEMENT: The trigger fires once per command, regardless of how many rows are changed. This is much more efficient for bulk logging or summary table refreshes.


Special Variables

Inside a trigger function, Postgres provides "magic" variables that give you context:

Variable

Description

NEW

The data being inserted or the updated version of a row. (NULL in DELETE).

OLD

The original data before it was updated or deleted. (NULL in INSERT).

TG_OP

The operation type: INSERT, UPDATE, DELETE, or TRUNCATE.

TG_TABLE_NAME

The name of the table that fired the trigger.


Visualizing the Trigger Logic

spinner

Pros and Cons for Data Engineers

The Pros:

  • Reliability: No matter how data enters the DB (Python script, manual SQL, or web app), the trigger will fire.

  • Real-time: Perfect for maintaining "Materialized" summary tables without waiting for a nightly batch job.

The Cons (The "Invisible" Logic):

  • Performance: FOR EACH ROW triggers can significantly slow down bulk COPY or INSERT operations.

  • Debugging: Triggers are "hidden" from the person running the SQL. A simple UPDATE failing because of a hidden, broken trigger can be very frustrating to troubleshoot.

Summary

  • Requirement: Always create the function first, then the trigger.

  • Optimization: Use FOR EACH STATEMENT where possible to reduce overhead.

  • Safety: Use triggers for Auditing (knowing who changed what) and Constraints that are too complex for a standard CHECK clause.


Last updated