Aggregations and analysis of data


DuckDB provides several powerful SQL features that make complex analytical tasks much easier than in standard SQL.


ASOF JOIN (Time-Series Alignment)

ASOF Joins are a game-changer for time-series data. They allow you to join two tables where timestamps don't exactly match—for example, joining a "Trades" table with a "Stock Prices" table to find the price as of the moment the trade happened.

  • Key Behavior: For each row in the left table, it finds the single closest match in the right table that is less than or equal to (or greater than/equal to) the left timestamp.

  • Example:

    SELECT t.time, t.symbol, p.price
    FROM trades t
    ASOF JOIN prices p 
      ON t.symbol = p.symbol 
     AND t.time >= p.time;

    This ensures you get the "most recent price before or at the time of the trade" without complex subqueries.


Table Functions

In DuckDB, functions can return entire tables instead of single values. These are often used for metadata or generating data on the fly.

  • range(): Generates a sequence of numbers.

    SELECT * FROM range(5); -- Returns rows 0, 1, 2, 3, 4
  • duckdb_extensions(): A metadata function that returns a table of all extensions.

    SELECT extension_name, loaded FROM duckdb_extensions();

LATERAL Joins (Correlated Subqueries)

A LATERAL join (often called a "correlated subquery in the FROM clause") allows a subquery to refer to columns from tables defined to its left in the FROM clause.

  • Use Case: When you need to run a function or a complex subquery for every single row of your main table.

  • Example: Finding the top 2 most expensive items for every category.


PIVOT and UNPIVOT Statements

DuckDB has one of the simplest PIVOT syntaxes in the industry. It transforms rows into columns (Wide format) or columns into rows (Long format).

  • PIVOT: Automatically detects values to create columns.

  • UNPIVOT: Useful for cleaning data where years or months were mistakenly used as column headers.


Advanced Filtering: QUALIFY and FILTER

  • QUALIFY: This filters the results of Window Functions (like ROW_NUMBER() or RANK()). It saves you from having to wrap your query in a CTE just to filter by rank.

  • FILTER: This is used inside aggregate functions to perform "conditional aggregation" more cleanly than using CASE WHEN.


GROUPING SETS, CUBE, and ROLLUP

These are used for multi-level reporting. They allow you to calculate multiple levels of "totals" and "sub-totals" in a single query.

  • ROLLUP: Creates a hierarchy of totals (e.g., Year total -> Month total -> Day total).

  • CUBE: Calculates every possible combination of totals for the columns listed.


Last updated