> For the complete documentation index, see [llms.txt](https://de-learning-logs.gitbook.io/my-de-learning-logs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://de-learning-logs.gitbook.io/my-de-learning-logs/sql/sql-concepts/indexing/inverted-index.md).

# Inverted Index

***

[Video explanation](https://youtu.be/BHCSL_ZifI0?t=695)

***

In the context of PostgreSQL and Data Engineering, an **Inverted Index** is the magic behind high-performance "Full-Text Search" and querying complex data types like arrays and JSONB.

While a standard B-Tree index maps a **row** to a **value**, an Inverted Index maps a **value** (or parts of a value) back to the **rows** that contain it.

***

### How it Works: The "Book Index" Analogy

Think of a textbook.

* A **Standard Index** is like the Table of Contents: "Chapter 5 starts on page 100."
* An **Inverted Index** is like the Index at the back of the book: The word "PostgreSQL" is found on pages 12, 45, and 102.

Instead of scanning every row to see if a word exists, the database looks up the word in the Inverted Index and immediately gets a list of all matching row IDs.

***

### GIN (Generalized Inverted Index)

In PostgreSQL, the most common implementation of an inverted index is the **GIN index**. It is "Generalized" because it doesn't care what the data is—as long as it can be broken down into individual "keys."

#### Common Use Cases:

1. **Full-Text Search**: Breaking a paragraph into individual words (lexemes).
2. **JSONB**: Indexing every key and value inside a JSON blob.
3. **Arrays:** Finding rows where an array contains a specific element.

```sql
-- Creating a GIN index on a JSONB column
CREATE INDEX idx_metadata_gin ON orders USING GIN (metadata);

-- Query that uses the index
SELECT * FROM orders WHERE metadata @> '{"status": "shipped"}';
```

***

### Standard Index (B-Tree) vs. GIN

| **Feature**      | **B-Tree (Standard)**              | **GIN (Inverted)**                               |
| ---------------- | ---------------------------------- | ------------------------------------------------ |
| **Best For**     | Equality (`=`), Ranges (`<`, `>`). | Membership (`@>`), Contained In (`<@`).          |
| **Data Type**    | Integers, Text, Timestamps.        | JSONB, Arrays, TSVector (Text Search).           |
| **Update Speed** | Fast.                              | **Slow** (one row change might update 100 keys). |
| **Size**         | Small.                             | Large (can grow significantly).                  |

***

### Performance Trade-offs for Data Engineers

While GIN indexes make searches incredibly fast, they come with a "Write Tax":

* **Slow Inserts**: When you insert one JSON document with 50 keys, Postgres has to update the GIN index 50 times (one for each key).
* **Maintenance**: GIN indexes are prone to bloat. Frequent updates to indexed JSONB columns will require more aggressive `VACUUM` settings.
* **The "Fast Update" Feature**: To mitigate slow inserts, GIN indexes have a `fastupdate` setting. It stores new entries in a temporary "pending list" and moves them to the main index in batches.

***

### Visualizing the GIN Structure

{% @mermaid/diagram content="graph LR
subgraph Keys\_in\_Index
K1\[Postgres]
K2\[Index]
K3\[Database]
end

```
subgraph Row_Pointers
R1[Row 1, Row 5, Row 10]
R2[Row 2, Row 5]
R3[Row 1, Row 2, Row 3]
end

K1 --> R1
K2 --> R2
K3 --> R3

style K1 fill:#f9f,stroke:#333
style R1 fill:#bbf,stroke:#333" %}
```

***

### Summary

* **Inverted Indexes** are essential for querying "Inside" data (JSON keys, Array elements, Words in text).
* **GIN** is the Postgres implementation of this concept.
* **Use GIN** when you need to search for a value that exists as a small part of a larger column.
* **Avoid GIN** on columns that are updated constantly, as the overhead of re-indexing every key can cripple write performance.

***


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://de-learning-logs.gitbook.io/my-de-learning-logs/sql/sql-concepts/indexing/inverted-index.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
