Triggers

In PostgreSQL, a Trigger is a specialized function that the database executes automatically whenever a specific event occurs—typically an INSERT, UPDATE, DELETE, or TRUNCATE.

As a Data Engineer, you use triggers to enforce complex business rules, maintain audit logs, or synchronize data between tables in real-time without relying on application-level code.

The Two-Step Creation

Unlike other database objects, a trigger in Postgres requires two distinct pieces:

The Trigger Function: A special function (written in PL/pgSQL) that returns a type of TRIGGER.
The Trigger Definition: The actual "tripwire" that tells Postgres which table and event should fire the function.

-- 1. Create the function
CREATE OR REPLACE FUNCTION log_price_changes()
RETURNS TRIGGER AS $$
BEGIN
    IF NEW.price <> OLD.price THEN
        INSERT INTO price_history(product_id, old_price, new_price, changed_at)
        VALUES (OLD.id, OLD.price, NEW.price, NOW());
    END IF;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

-- 2. Create the trigger
CREATE TRIGGER trg_price_update
AFTER UPDATE ON products
FOR EACH ROW
EXECUTE FUNCTION log_price_changes();

Trigger Timing: BEFORE vs. AFTER

The timing of a trigger is crucial for its function:

BEFORE: Executes before the data is saved to the disk.
- Use case: Data validation or auto-formatting (e.g., forcing a username to lowercase before it's saved). You can modify the NEW record here.
AFTER: Executes after the data is saved.
- Use case: Auditing, logging, or updating other tables. You cannot modify the current record here because it's already "done."
INSTEAD OF: Used primarily with Views. It tells Postgres, "Don't try to update the view; run this logic instead" (e.g., updating the underlying base tables).

Row-Level vs. Statement-Level

FOR EACH ROW: The trigger fires once for every single row affected. If you update 1,000 rows, the trigger runs 1,000 times.
FOR EACH STATEMENT: The trigger fires once per command, regardless of how many rows are changed. This is much more efficient for bulk logging or summary table refreshes.

Special Variables

Inside a trigger function, Postgres provides "magic" variables that give you context:

Variable

Description

NEW

The data being inserted or the updated version of a row. (NULL in DELETE).

OLD

The original data before it was updated or deleted. (NULL in INSERT).

TG_OP

The operation type: INSERT, UPDATE, DELETE, or TRUNCATE.

TG_TABLE_NAME

The name of the table that fired the trigger.

Visualizing the Trigger Logic

Pros and Cons for Data Engineers

The Pros:

Reliability: No matter how data enters the DB (Python script, manual SQL, or web app), the trigger will fire.
Real-time: Perfect for maintaining "Materialized" summary tables without waiting for a nightly batch job.

The Cons (The "Invisible" Logic):

Performance: FOR EACH ROW triggers can significantly slow down bulk COPY or INSERT operations.
Debugging: Triggers are "hidden" from the person running the SQL. A simple UPDATE failing because of a hidden, broken trigger can be very frustrating to troubleshoot.

Summary

Requirement: Always create the function first, then the trigger.
Optimization: Use FOR EACH STATEMENT where possible to reduce overhead.
Safety: Use triggers for Auditing (knowing who changed what) and Constraints that are too complex for a standard CHECK clause.

PreviousPartitioning + Clustering NextStored Procedures

Last updated 18 days ago

hashtagThe Two-Step Creation

hashtagTrigger Timing: BEFORE vs. AFTER

hashtagRow-Level vs. Statement-Level

hashtagSpecial Variables

hashtagVisualizing the Trigger Logic

hashtagPros and Cons for Data Engineers

hashtagSummary