# VACUUM

***

In PostgreSQL, **VACUUM** is the essential process for managing storage and maintaining performance. Because of how Postgres handles data updates (using a system called MVCC), "deleted" data isn't actually removed from the disk immediately.

***

### Why VACUUM is Necessary: MVCC & Bloat

Postgres uses **Multi-Version Concurrency Control (MVCC)**. When you **DELETE** a row, the database simply marks it as "invisible" to future transactions. When you **UPDATE** a row, Postgres marks the old version as invisible and inserts a brand-new version (**a new tuple**).

* **Dead Tuples**: These invisible, old versions of rows are called "dead tuples."
* **Bloat**: If dead tuples aren't cleaned up, your table files grow larger and larger, even if the actual amount of "live" data stays the same. This makes queries slower because the engine has to scan through wasted space.

***

### Types of VACUUM

There are three main "levels" of vacuuming, ranging from standard maintenance to a total rebuild.

#### Standard `VACUUM`

This is the most common form. It scans the table and marks the space occupied by dead tuples as **available for new data**.

* **Locking**: It does **not** lock the table. You can continue to read and write data while it runs.
* **Disk Space**: It usually does **not** return space to the Operating System. It just keeps the space inside the database file for future use.

#### `VACUUM ANALYZE`

This performs a standard vacuum and then immediately runs **ANALYZE**.

* **Purpose**: It updates the **statistics** about the table (how many rows, common values, etc.).
* **Importance**: The Query Planner uses these statistics to decide whether to use an Index Scan or a Table Scan. Without regular `ANALYZE`, the planner makes bad, slow decisions.

#### `VACUUM FULL`

This is the "nuclear option." It creates a brand-new, compacted version of the table on disk and deletes the old, bloated version.

* **Locking**: It requires an **Access Exclusive Lock**. The table is completely unusable (no reads or writes) while this runs.
* **Disk Space**: It **does** return all reclaimed space to the Operating System.
* **Warning:** Only use this during maintenance windows when the table is extremely bloated (e.g., 50% or more dead space).

***

### The Visibility Map

To make standard vacuuming faster, Postgres uses a **Visibility Map**. This is a small file that tracks which "pages" of the table only contain live tuples. `VACUUM` will skip those pages entirely, significantly reducing the amount of work the database has to do.

***

### Practical Commands

| **Command**                  | **Action**                                | **Impact on Performance**                  |
| ---------------------------- | ----------------------------------------- | ------------------------------------------ |
| `VACUUM;`                    | Reclaims space for internal reuse.        | Minimal (can run during the day).          |
| `VACUUM ANALYZE my_table;`   | Reclaims space + updates optimizer stats. | **Recommended** after large data loads.    |
| `VACUUM FULL my_table;`      | Rebuilds the entire table file.           | **High** (Table is locked until finished). |
| `VACUUM (VERBOSE, ANALYZE);` | Shows detailed logs of what was cleaned.  | Useful for debugging bloat issues.         |

***

### Summary: The Lifecycle of a Row

1. **Insert:** A live tuple is created.
2. **Update/Delete**: The tuple is marked "dead" but stays on the disk.
3. **Vacuum**: The engine sees the dead tuple and marks that spot on the disk as "Available."
4. **New Insert:** The engine places a new live tuple in that "Available" spot instead of growing the file.

***
