Cron jobs

PostgreSQL Automation: `pg_cron`

In PostgreSQL, the industry standard for scheduling tasks is the pg_cron extension. Since Postgres doesn't have a native internal scheduler, pg_cron allows you to run maintenance and ETL tasks directly from SQL using standard cron syntax.

Setup and Configuration

pg_cron runs as a background worker and must be loaded when the database starts:

Modify postgresql.conf: Add pg_cron to the shared libraries.

shared_preload_libraries = 'pg_cron'
cron.database_name = 'postgres'  -- The DB that metadata is stored in

Restart Postgres: Required for the library to load.
Create the Extension: Run CREATE EXTENSION pg_cron; in your database.

Scheduling Syntax

The syntax follows the standard Linux cron format: (minute, hour, day, month, day-of-week).

-- Schedule a task: SELECT cron.schedule('job_name', 'schedule', 'SQL_command');
SELECT cron.schedule('nightly-maintenance', '0 3 * * *', 'VACUUM ANALYZE;');

Vital Maintenance: The `VACUUM`

Postgres uses MVCC (Multi-Version Concurrency Control). When you UPDATE or DELETE a row, the old version is not immediately removed—it remains on disk as a "dead tuple" (bloat).

VACUUM: Reclaims the space occupied by dead tuples so it can be reused by new data.
ANALYZE: Updates the statistics used by the Query Planner. Without fresh stats, the optimizer might choose a slow Table Scan over a fast Index Scan.
The "Pro" Strategy: While Postgres has an autovacuum daemon, Data Engineers often schedule a manual VACUUM ANALYZE at 3:00 AM or immediately following a massive bulk data load to ensure peak performance.

Common Data Engineering Use Cases

Use Case

SQL Command Example

Frequency

Reporting

REFRESH MATERIALIZED VIEW CONCURRENTLY sales_sum;

Hourly

Partitioning

CALL create_next_month_partitions();

Monthly

Data Retention

DELETE FROM logs WHERE created_at < NOW() - INTERVAL '90 days';

Weekly

System Health

VACUUM ANALYZE;

Nightly

Monitoring and Alerting

Since cron jobs run in the background, you must monitor the pg_cron metadata tables to catch failures:

cron.job: View all currently scheduled tasks.
cron.job_run_details: The most critical table for DEs. It shows:
- Status: (succeeded or failed)
- Return Message: Contains the specific error if a job crashed.
- Start/End Time: Helps identify jobs that are taking longer than expected.

Simple Alerting Query

You can create a view or a script that checks for recent failures:

SELECT jobid, start_time, return_message 
FROM cron.job_run_details 
WHERE status = 'failed' 
AND start_time > NOW() - INTERVAL '24 hours';

PreviousPostgreSQL NextVACUUM

Last updated 2 months ago

hashtagPostgreSQL Automation: pg_cron

hashtagSetup and Configuration

hashtagScheduling Syntax

hashtagVital Maintenance: The VACUUM

hashtagCommon Data Engineering Use Cases

hashtagMonitoring and Alerting

hashtagSimple Alerting Query