Stored Procedures


In PostgreSQL, Stored Procedures are objects that contain a collection of SQL statements and procedural logic (like loops and if-else branches) stored directly on the database server.

While they are very similar to Functions, they were introduced in PostgreSQL 11 to solve a specific limitation: the ability to manage transactions.


Procedure vs. Function

The most important distinction for a Data Engineer is how they handle "Units of Work."

  • Functions (CREATE FUNCTION): Run inside a single transaction. You cannot "Commit" or "Rollback" inside a function. They are usually designed to return a value (like a calculation).

  • Stored Procedures (CREATE PROCEDURE): Can open and close transactions. This makes them ideal for long-running ETL tasks where you want to commit data in "batches" rather than all at once.

Feature

Function

Stored Procedure

Execution

SELECT my_func();

CALL my_proc();

Transactions

Cannot COMMIT/ROLLBACK.

Can COMMIT/ROLLBACK.

Return Value

Must return a value (or void).

Does not return a value (uses INOUT parameters).

Use Case

Calculations, Data Formatting.

Bulk ETL, Maintenance, Batching.


Basic Syntax

Procedures are typically written in PL/pgSQL.

CREATE OR REPLACE PROCEDURE archive_old_orders(cutoff_date DATE)
LANGUAGE plpgsql
AS $$
BEGIN
    -- Move data to archive
    INSERT INTO orders_archive 
    SELECT * FROM orders WHERE order_date < cutoff_date;

    -- Delete from main table
    DELETE FROM orders WHERE order_date < cutoff_date;

    -- We can commit here to free up locks!
    COMMIT; 
END;
$$;

To run it, you use the CALL keyword:


Key Procedural Logic (PL/pgSQL)

Inside a procedure, you aren't limited to standard SQL. You have access to:

  • Variables: DECLARE my_var INT := 10;

  • Conditionals: IF ... THEN ... ELSE ... END IF;

  • Loops: FOR ... IN ... LOOP ... END LOOP;

  • Error Handling: EXCEPTION WHEN OTHERS THEN ...


Why Data Engineers Use Procedures

  1. Reduced Network Overhead: Instead of sending 50 individual SQL commands from a Python script to the database, you send one CALL command. The database executes the logic locally.

  2. Batch Processing: In a massive update, you can commit every 10,000 rows to prevent the Write-Ahead Log (WAL) from growing too large and to avoid holding locks for hours.

  3. Security: You can give a user permission to CALL a procedure without giving them direct SELECT or DELETE access to the underlying tables (encapsulation).


Visualizing Procedure Execution

spinner

Summary

  • CALL is for Procedures; SELECT is for Functions.

  • Use Procedures when you need Transaction Control (COMMIT/ROLLBACK).

  • Stored logic keeps your ETL "close to the data," improving performance and maintainability.


Last updated