# Cron jobs

***

### PostgreSQL Automation: `pg_cron`

In **PostgreSQL**, the industry standard for scheduling tasks is the `pg_cron` extension. Since Postgres doesn't have a native internal scheduler, `pg_cron` allows you to run maintenance and ETL tasks directly from SQL using standard cron syntax.

#### Setup and Configuration

`pg_cron` runs as a background worker and must be loaded when the database starts:

1. **Modify `postgresql.conf`**: Add `pg_cron` to the shared libraries.

   ```
   shared_preload_libraries = 'pg_cron'
   cron.database_name = 'postgres'  -- The DB that metadata is stored in
   ```
2. **Restart Postgres**: Required for the library to load.
3. **Create the Extension**: Run `CREATE EXTENSION pg_cron;` in your database.

#### Scheduling Syntax

The syntax follows the standard Linux cron format: `(minute, hour, day, month, day-of-week)`.

```sql
-- Schedule a task: SELECT cron.schedule('job_name', 'schedule', 'SQL_command');
SELECT cron.schedule('nightly-maintenance', '0 3 * * *', 'VACUUM ANALYZE;');
```

***

### Vital Maintenance: The `VACUUM`

Postgres uses **MVCC** (Multi-Version Concurrency Control). When you `UPDATE` or `DELETE` a row, the old version is not immediately removed—it remains on disk as a **"dead tuple"** (bloat).

* **`VACUUM`**: Reclaims the space occupied by dead tuples so it can be reused by new data.
* **`ANALYZE`**: Updates the statistics used by the **Query Planner**. Without fresh stats, the optimizer might choose a slow Table Scan over a fast Index Scan.
* **The "Pro" Strategy**: While Postgres has an `autovacuum` daemon, Data Engineers often schedule a manual `VACUUM ANALYZE` at 3:00 AM or immediately following a massive bulk data load to ensure peak performance.

***

### Common Data Engineering Use Cases

| **Use Case**       | **SQL Command Example**                                           | **Frequency** |
| ------------------ | ----------------------------------------------------------------- | ------------- |
| **Reporting**      | `REFRESH MATERIALIZED VIEW CONCURRENTLY sales_sum;`               | Hourly        |
| **Partitioning**   | `CALL create_next_month_partitions();`                            | Monthly       |
| **Data Retention** | `DELETE FROM logs WHERE created_at < NOW() - INTERVAL '90 days';` | Weekly        |
| **System Health**  | `VACUUM ANALYZE;`                                                 | Nightly       |

***

### Monitoring and Alerting

Since cron jobs run in the background, you must monitor the `pg_cron` metadata tables to catch failures:

* `cron.job`: View all currently scheduled tasks.
* `cron.job_run_details`: The most critical table for DEs. It shows:
  * **Status**: (`succeeded` or `failed`)
  * **Return Message**: Contains the specific error if a job crashed.
  * **Start/End Time**: Helps identify jobs that are taking longer than expected.

#### Simple Alerting Query

You can create a view or a script that checks for recent failures:

```sql
SELECT jobid, start_time, return_message 
FROM cron.job_run_details 
WHERE status = 'failed' 
AND start_time > NOW() - INTERVAL '24 hours';
```

***
