CTEs - Common Table Expressions

CTEs (Common Table Expressions)

CTEs only exist within the context of a single SQL statement. They are typically used in more complex queries to improve readability, structure, and modularity. A CTE is introduced with the WITH keyword and behaves like a temporary, named result set that can be referenced by the main query.

The difference between CTEs and subqueries is primarily in how they are processed: CTEs allow naming and reusing a result set, improve clarity, and (depending on the SQL engine) may be optimized or materialized differently. They also uniquely support recursion.

Syntax

The WITH clause binds the output of the inner query to a temporary table with the given name.

Example: Create a CTE called cteName that contains a single value:

WITH cteName AS (
    SELECT 1
)
SELECT * FROM cteName;

Column names may be bound explicitly:

WITH cteName (col1, col2) AS (
    SELECT 1, 2
)
SELECT col1 + col2
FROM cteName;

A single query may contain multiple CTE declarations:

WITH
    cte1 (col1) AS (SELECT 1),
    cte2 (col2) AS (SELECT 2)
SELECT *
FROM cte1, cte2;

Example of using CTEs in dbt

-- 1. Import CTEs (Bring in your raw data)
WITH raw_trips AS (
    SELECT * FROM {{ ref('stg_ny_taxi_raw') }}
),

-- 2. Logical CTEs (Perform your transformations/cleaning)
cleaned_trips AS (
    SELECT
        vendorid,
        tpep_pickup_datetime,
        -- Using standard SQL casting
        CAST(fare_amount AS DECIMAL(10,2)) AS fare,
        trip_distance
    FROM raw_trips
    WHERE trip_distance > 0
),

-- 3. Final CTE (The one that actually gets "materialized")
final AS (
    SELECT
        vendorid,
        SUM(fare) AS total_revenue,
        AVG(trip_distance) AS avg_distance
    FROM cleaned_trips
    GROUP BY 1
)

-- 4. Output (dbt uses this to build the actual table/view)
SELECT * FROM final;

Recursive CTEs

Adding the RECURSIVE keyword after WITH allows a CTE to reference itself. This enables recursion in SQL, such as generating sequences or traversing hierarchical data.

The Anatomy of a Recursive CTE

A Recursive CTE always has three specific parts. If you miss one, it won't work.

The Anchor Member (The Starter): This runs once at the very beginning. It creates the initial row(s).
The Recursive Member (The Loop): This references the CTE itself. It takes the output of the previous step and uses it to generate new rows.
The Terminator (The Stop Sign): This is usually a WHERE clause inside the recursive member. Without this, your query runs forever (infinite loop).

Visualizing the Execution

Let's look at your example of counting 1 to 10.

WITH RECURSIVE cteSource (counter) AS (
    -- 1. ANCHOR MEMBER
    SELECT 1

    UNION ALL

    -- 2. RECURSIVE MEMBER
    SELECT counter + 1
    FROM cteSource
    WHERE counter < 10  -- 3. TERMINATOR
)
SELECT * FROM cteSource;

Step 1: The Anchor Runs

SQL runs SELECT 1.
Result so far: [1]
Note: This row is now effectively "passed" to the recursive part.

Step 2: Recursion Round 1

SQL looks at the previous result (1).
It runs SELECT counter + 1 (which is 1 + 1).
It checks the terminator: Is 1 < 10? Yes.
Result so far: [1, 2]

Step 3: Recursion Round 2

SQL looks at the new result from the last step (2).
It runs SELECT counter + 1 (which is 2 + 1).
It checks the terminator: Is 2 < 10? Yes.
Result so far: [1, 2, 3]

... Steps 4 through 9 happen ...

Step 10: The Stop

SQL looks at the previous result (10).
It runs SELECT counter + 1 (which is 11).
It checks the terminator: Is 10 < 10? NO.
The recursive part returns nothing.
The loop breaks.
Final Result: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10

PreviousSubqueries NextViews, Materialized Views, and Temporary Tables

Last updated 23 days ago

hashtagCTEs (Common Table Expressions)

hashtagSyntax

hashtagRecursive CTEs

hashtagThe Anatomy of a Recursive CTE

hashtagVisualizing the Execution