Aggregations and analysis of data

DuckDB provides several powerful SQL features that make complex analytical tasks much easier than in standard SQL.

`ASOF JOIN` (Time-Series Alignment)

ASOF Joins are a game-changer for time-series data. They allow you to join two tables where timestamps don't exactly match—for example, joining a "Trades" table with a "Stock Prices" table to find the price as of the moment the trade happened.

Key Behavior: For each row in the left table, it finds the single closest match in the right table that is less than or equal to (or greater than/equal to) the left timestamp.

Example:

SELECT t.time, t.symbol, p.price
FROM trades t
ASOF JOIN prices p 
  ON t.symbol = p.symbol 
 AND t.time >= p.time;

This ensures you get the "most recent price before or at the time of the trade" without complex subqueries.

Table Functions

In DuckDB, functions can return entire tables instead of single values. These are often used for metadata or generating data on the fly.

range(): Generates a sequence of numbers.

SELECT * FROM range(5); -- Returns rows 0, 1, 2, 3, 4

duckdb_extensions(): A metadata function that returns a table of all extensions.
```
SELECT extension_name, loaded FROM duckdb_extensions();
```

`LATERAL` Joins (Correlated Subqueries)

A LATERAL join (often called a "correlated subquery in the FROM clause") allows a subquery to refer to columns from tables defined to its left in the FROM clause.

Use Case: When you need to run a function or a complex subquery for every single row of your main table.

Example: Finding the top 2 most expensive items for every category.

SELECT c.name, top_items.item_name
FROM categories c
CROSS JOIN LATERAL (
    SELECT item_name 
    FROM items i 
    WHERE i.category_id = c.id 
    ORDER BY price DESC 
    LIMIT 2
) AS top_items;

`PIVOT` and `UNPIVOT` Statements

DuckDB has one of the simplest PIVOT syntaxes in the industry. It transforms rows into columns (Wide format) or columns into rows (Long format).

PIVOT: Automatically detects values to create columns.

-- Turns months into columns and sums the amount
PIVOT sales ON month USING sum(amount);

UNPIVOT: Useful for cleaning data where years or months were mistakenly used as column headers.
```
UNPIVOT wide_data ON 2022, 2023, 2024 INTO NAME year VALUE revenue;
```

Advanced Filtering: `QUALIFY` and `FILTER`

QUALIFY: This filters the results of Window Functions (like ROW_NUMBER() or RANK()). It saves you from having to wrap your query in a CTE just to filter by rank.
```
SELECT user_id, score, rank() OVER (ORDER BY score DESC) as r
FROM leaderboard
QUALIFY r <= 3; -- Keeps only the top 3 scores
```
FILTER: This is used inside aggregate functions to perform "conditional aggregation" more cleanly than using CASE WHEN.
```
SELECT 
    count(*) AS total_count,
    count(*) FILTER (WHERE status = 'error') AS error_count
FROM logs;
```

`GROUPING SETS`, `CUBE`, and `ROLLUP`

These are used for multi-level reporting. They allow you to calculate multiple levels of "totals" and "sub-totals" in a single query.

ROLLUP: Creates a hierarchy of totals (e.g., Year total -> Month total -> Day total).
CUBE: Calculates every possible combination of totals for the columns listed.

Previous"Attaching" databases (Database Integration)NextHow DuckDB optimizes its performance

Last updated 16 days ago

hashtagASOF JOIN (Time-Series Alignment)

hashtagTable Functions

hashtagLATERAL Joins (Correlated Subqueries)

hashtagPIVOT and UNPIVOT Statements

hashtagAdvanced Filtering: QUALIFY and FILTER

hashtagGROUPING SETS, CUBE, and ROLLUP