COLUMNS()


In DuckDB, COLUMNS() is a "meta-programming" feature that lets you apply logic to multiple columns at once without typing their names. Think of it as Regular Expressions for your SELECT list.

It solves the "wall of text" problem when dealing with wide tables (e.g., datasets with 100+ sensors or financial metrics).


The Basics: Selecting Multiple Columns

Instead of listing every column, you can use a string pattern or a regex to grab what you need.

  • By Pattern: SELECT COLUMNS('sales_.*') FROM table (Selects all columns starting with "sales_").

  • By Type: SELECT COLUMNS(c -> c.type = 'INTEGER') FROM table (Selects only integer columns).


2. Applying Functions to Everything

The real power of COLUMNS() is "projection." You can wrap a function around COLUMNS() to apply it to every matching field simultaneously.

The Scenario: You have a table with 50 different price columns and you need to round all of them to 2 decimal places.

The DuckDB way:

SELECT ROUND(COLUMNS('price_.*'), 2) FROM products;

This is equivalent to writing ROUND(price_usd, 2), ROUND(price_eur, 2)... for every single match.


Combining with Aggregates

You can use COLUMNS() inside aggregate functions to perform bulk analysis.

Example: Get the maximum value of every "sensor" column:


Advanced: Lambda Functions

DuckDB allows you to pass a lambda (a mini-function) into COLUMNS() to filter columns based on complex logic.


Why this is a "Killer Feature"

In standard SQL (like Postgres or MySQL), if you want to perform a calculation on 20 columns, you have to write it 20 times or use a Python/dbt script to generate the SQL.

DuckDB’s COLUMNS() makes the SQL itself dynamic.

Comparison: Standard SQL vs. DuckDB

Feature

Standard SQL (Postgres)

DuckDB COLUMNS()

Effort

Manual listing

Pattern-based

Maintenance

High (if table schema changes)

Low (adapts to new columns)

Bulk Formatting

Requires CTEs or scripting

Single-line function


Wait, there's more: You can actually use COLUMNS() inside a WHERE clause too. For example: WHERE COLUMNS('price.*') > 0 would filter for rows where every price column is positive.


Last updated