EXPLAIN and EXPLAIN ANALYZE


In SQL, the EXPLAIN command is your "X-ray vision." It allows you to see the Execution Plan—the step-by-step roadmap the database optimizer creates to retrieve your data.

As a Data Engineer, this is your primary tool for moving from "the query is slow" to "I know exactly why it's slow."


What is EXPLAIN?

When you run EXPLAIN, the database does not actually run your query. Instead, it analyzes the query and returns a report of how it plans to execute it.

It reveals:

  • The Scan Type: Is it a "Sequential Scan" (reading the whole table) or an "Index Scan" (using a shortcut)?

  • The Join Algorithm: Is it using a Hash Join, Merge Join, or Nested Loop?

  • Estimated Cost: An arbitrary number representing the "effort" (CPU and I/O) required.

  • Estimated Rows: How many rows the optimizer thinks it will process at each step.

Example Syntax:

EXPLAIN
SELECT * FROM taxi_trips 
WHERE fare_amount > 100;

What is EXPLAIN ANALYZE?

EXPLAIN ANALYZE is far more powerful. It actually executes the query, throws away the result set, and provides a report comparing the estimated costs against the actual time taken.

  • Actual Time: The real clock time (in milliseconds) each step took.

  • Actual Loops: How many times a specific operation had to be repeated.

  • Memory Usage: How much RAM was used for sorts or joins.

Example Syntax:


Key Differences: Estimate vs. Reality

Feature

EXPLAIN

EXPLAIN ANALYZE

Execution

Does not run the query.

Runs the query.

Side Effects

Safe to run on DELETE or UPDATE.

Will actually delete/update data!

Accuracy

Based on statistics (might be outdated).

100% accurate (real-world performance).

Information

Estimated cost/rows.

Actual time, memory, and disk I/O.

Warning: Never run EXPLAIN ANALYZE on a DELETE or DROP statement in production unless you wrap it in a transaction that you intend to rollback!


What to Look for in a Plan (Red Flags)

When reading the output of an Explain plan, keep an eye out for these performance killers:

  1. Seq Scan (Sequential Scan): This means the database is reading every single row on the disk. If the table is large, you likely need an Index.

  2. External Merge / Disk Sort: This happens when a SORT or GROUP BY is too big for the allocated RAM (work_mem) and spills to the slow hard drive.

  3. Nested Loop: On large tables, this can be incredibly slow. It indicates the database is looping through Table B for every single row in Table A.

  4. Row Mis-estimates: If EXPLAIN estimates 10 rows but ANALYZE shows 1,000,000, your database statistics are stale, and the optimizer is making bad decisions based on old info.


Summary

  • EXPLAIN is for planning and safety (checking if a query will be slow before running it).

  • EXPLAIN ANALYZE is for profiling and debugging (seeing where the actual bottlenecks are).


Last updated