Abstract Syntax Tree


In the context of SQL processing, optimizing the Abstract Syntax Tree (AST) is the bridge between a human-readable query and a high-performance machine plan. In PostgreSQL, this phase is technically known as Query Rewriting and Logical Optimization.

The database doesn't execute the AST directly. Instead, it transforms it into a Query Tree and then "simplifies" it before the Optimizer ever looks at table statistics.


Logical Simplification (Constant Folding)

The first way an AST is optimized is by resolving anything that doesn't require a disk hit.

  • Math Resolution: WHERE price > 10 * 5 is transformed into WHERE price > 50.

  • Function Simplification: Certain immutable functions are resolved early so the executor doesn't have to calculate them for every row.

  • Contradiction Detection: If you write WHERE 1=0, the rewriter identifies the "short-circuit" and can return an empty result set without even scanning the table.


Predicate Pushdown

This is one of the most important AST optimizations. The goal is to "push" filters as close to the data source as possible.

  • The Logic: If you join two tables and then apply a WHERE clause, the optimizer tries to move that WHERE clause inside the scan of the base table.

  • Why? It is much cheaper to discard 1 million rows during the initial scan than to join those 1 million rows and discard them at the very end.


Subquery Unrolling (Flattening)

The AST often contains nested subqueries. A key optimization step is "flattening" these into Joins where possible.

  • Subquery: SELECT * FROM users WHERE id IN (SELECT user_id FROM orders)

  • Optimized AST: The rewriter may transform this into a SEMI-JOIN. Semi-joins are significantly faster because the database can stop looking for a user once it finds the first matching order, rather than processing all orders.


View Merging

When you query a View, the AST for the View is merged into the AST of the main query.

  • The optimizer doesn't run the View, save it to a temp table, and then run your query.

  • It combines the two definitions into a single tree so it can optimize the entire logic at once (like pushing your main query's filters into the View's logic).


Summary of the Transformation Flow

spinner

How a Data Engineer Influences AST Optimization

You can't "edit" the AST directly, but you can write SQL that is easier for the rewriter to handle:

  1. Avoid "Opaque" Functions: If you wrap a column in a function (e.g., WHERE LOWER(name) = 'gemini'), you prevent the optimizer from pushing that predicate down to an index scan.

  2. Use Common Table Expressions (CTEs): In older versions of Postgres, CTEs acted as "optimization fences" (they were materialized). In modern Postgres, the rewriter can "inline" them, allowing for better AST optimization across the CTE boundary.

  3. Union All vs Union: Use UNION ALL if you don't need to de-duplicate. UNION adds a "Unique" node to the AST, which is a costly operation.


Last updated