# Window functions

***

Another [explanation](https://datalemur.com/blog/sql-interview-cheat-sheet)

***

## **Window Functions in SQL (Complete Guide)**

Window functions allow you to perform calculations **across a set of related rows** without collapsing them into a single output row (unlike `GROUP BY`).

Think of window functions as “running calculations” over a **window**, i.e., a defined set of rows relative to the current row.

***

### **Basic Anatomy of a Window Function**

Every window function has the following pattern:

```sql
function_name(expression) OVER (
    PARTITION BY ...
    ORDER BY ...
    ROWS or RANGE ...
)
```

**Three optional components of the window:**

* **PARTITION BY** — splits data into groups (like `GROUP BY` but non-destructive). In this context, **"non-destructive"** means that `PARTITION BY` divides your data into groups **without reducing the number of rows** in your result set.
* **ORDER BY** — defines row ordering *within* partitions.
* **Frame clause** (`ROWS` or `RANGE`) — defines which rows are visible to each calculation.

Example:

```sql
SELECT
    order_id,
    customer_id,
    amount,
    SUM(amount) OVER (PARTITION BY customer_id ORDER BY order_date) AS running_total
FROM orders;
```

***

#### **Categories of Window Functions**

There are **four main categories**:

1. **Aggregate window functions**
2. **Ranking window functions**
3. **Value window functions** (LAG/LEAD, FIRST\_VALUE, LAST\_VALUE)
4. **Distribution window functions** (PERCENT\_RANK, NTILE, CUME\_DIST)

Below is each one in detail.

***

### **Aggregate Window Functions**

These apply aggregate functions but **do not collapse rows**.

Common functions:

* `SUM()`
* `COUNT()`
* `AVG()`
* `MIN()`
* `MAX()`

**Example: Running total per customer**

```sql
SELECT 
    customer_id,
    order_id,
    amount,
    SUM(amount) OVER (
        PARTITION BY customer_id
        ORDER BY order_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS running_total
FROM orders;
```

**Example: Moving average (3-row sliding window)**

```sql
SELECT
    order_date,
    amount,
    AVG(amount) OVER (
        ORDER BY order_date
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS moving_avg_3
FROM daily_sales;
```

**Example: Total per partition**

```sql
SELECT
    department,
    employee_name,
    salary,
    SUM(salary) OVER (PARTITION BY department) AS dept_total
FROM employees;
```

***

### **Ranking Window Functions**

Used for analytics/OLAP, leaderboards, top-N filtering, etc.

#### Ranking functions:

* `ROW_NUMBER()`
* `RANK()`
* `DENSE_RANK()`
* `PERCENT_RANK()`

***

**Example: ROW\_NUMBER (no ties)**

```sql
SELECT
    order_id,
    customer_id,
    amount,
    ROW_NUMBER() OVER(
        PARTITION BY customer_id ORDER BY amount DESC
    ) AS rank_by_amount
FROM orders;
```

**Example: RANK vs DENSE\_RANK**

```sql
SELECT
    employee_name,
    salary,
    RANK()        OVER (ORDER BY salary DESC) AS rank_salary,
    DENSE_RANK()  OVER (ORDER BY salary DESC) AS dense_rank_salary
FROM employees;
```

**Difference:**

* `RANK()` skips numbers on ties (1,2,2,4,…)
* `DENSE_RANK()` does not skip (1,2,2,3,…)

***

### **Value Functions (LAG/LEAD, FIRST\_VALUE, LAST\_VALUE)**

These are essential for **change detection**, **period comparisons**, **time-series analytics**, and **before/after analysis**.

**LAG / LEAD**

* `LAG()` → looks at previous row
* `LEAD()` → looks at next row

**Example: Month-over-month difference**

```sql
SELECT
    month,
    revenue,
    revenue - LAG(revenue) OVER (ORDER BY month) AS mom_change
FROM monthly_revenue;
```

**Example: Compare an event to next event**

```sql
SELECT
    event_time,
    value,
    LEAD(value) OVER (ORDER BY event_time) - value AS next_diff
FROM metrics;
```

***

**FIRST\_VALUE / LAST\_VALUE**

**Example: Price relative to first purchase**

```sql
SELECT
    customer_id,
    purchase_date,
    price,
    price - FIRST_VALUE(price) OVER (
        PARTITION BY customer_id ORDER BY purchase_date
    ) AS price_diff_from_first
FROM purchases;
```

> Important: `LAST_VALUE()` often requires a frame clause:\
> `ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING`

**Example: Last order value per customer**

```sql
SELECT
    customer_id,
    order_id,
    LAST_VALUE(amount) OVER (
        PARTITION BY customer_id
        ORDER BY order_id
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS last_order_amount
FROM orders;
```

***

### **Distribution Window Functions**

These help quantify *relative position* within a partition — extremely useful for scoring, percentiles, and statistical ranking.

**Functions:**

* `NTILE(n)` — splits into N buckets
* `CUME_DIST()` — cumulative distribution
* `PERCENT_RANK()` — relative rank from 0 to 1

***

**Example: NTILE (quartiles)**

```sql
SELECT
    employee_name,
    salary,
    NTILE(4) OVER (ORDER BY salary) AS salary_quartile
FROM employees;
```

**Example: CUME\_DIST**

```sql
SELECT
    score,
    CUME_DIST() OVER (ORDER BY score) AS cumulative_dist
FROM test_scores;
```

**Example: PERCENT\_RANK**

```sql
SELECT
    product_id,
    revenue,
    PERCENT_RANK() OVER (ORDER BY revenue DESC) AS percentile_rank
FROM product_performance;
```

***

### **Frame Clauses (ROWS vs RANGE)**

Probably the most misunderstood part of window functions.

**ROWS: Physical row-based**

Example:

* `ROWS BETWEEN 2 PRECEDING AND CURRENT ROW`\
  → previous 2 rows, regardless of values

**RANGE: Value-based**

Example:

* `RANGE BETWEEN INTERVAL '7 days' PRECEDING AND CURRENT ROW`\
  → all rows within the last 7 days, regardless of count

**Example Comparison**

```sql
SUM(amount) OVER (
    ORDER BY order_date
    ROWS 3 PRECEDING
);

SUM(amount) OVER (
    ORDER BY order_date
    RANGE INTERVAL '3 days' PRECEDING
);
```

***

### **Correlated Window Functions**

You can combine window functions with filtering to express advanced analytics patterns.

**Example: Top 3 orders per customer**

```sql
SELECT *
FROM (
    SELECT
        *,
        ROW_NUMBER() OVER(
            PARTITION BY customer_id ORDER BY amount DESC
        ) AS rn
    FROM orders
) ranked
WHERE rn <= 3;
```

***

### **Practical Analytics Use Cases**

**1. Churn Analysis**

```sql
SELECT
    customer_id,
    month,
    LAG(activity_month) OVER (PARTITION BY customer_id ORDER BY month) AS previous_month
FROM usage_log;
```

**2. Running totals with resets**

```sql
SUM(sales) OVER (
    PARTITION BY region
    ORDER BY day
) AS cumulative_sales
```

**3. Gaps and Islands**

Detecting breaks in sequences:

```sql
SELECT
    event_time,
    value,
    LAG(event_time) OVER (ORDER BY event_time) AS previous_time
FROM events;
```

**4. Top performers per category**

```sql
RANK() OVER (PARTITION BY category ORDER BY sales DESC)
```

***

### **Summary Table**

| Category     | Functions                            | Purpose                         |
| ------------ | ------------------------------------ | ------------------------------- |
| Aggregate    | `SUM, AVG, MIN, MAX, COUNT`          | Running totals, moving averages |
| Ranking      | `ROW_NUMBER, RANK, DENSE_RANK`       | Top-N, leaderboards             |
| Value        | `LAG, LEAD, FIRST_VALUE, LAST_VALUE` | Period comparisons, time-series |
| Distribution | `NTILE, CUME_DIST, PERCENT_RANK`     | Percentiles, bucketization      |
| Frame-based  | `ROWS, RANGE`                        | Window sizing logic             |

***


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://de-learning-logs.gitbook.io/my-de-learning-logs/sql/sql-concepts/window-functions.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
