EXPLAIN and EXPLAIN ANALYZE
In SQL, the EXPLAIN command is your "X-ray vision." It allows you to see the Execution Plan—the step-by-step roadmap the database optimizer creates to retrieve your data.
As a Data Engineer, this is your primary tool for moving from "the query is slow" to "I know exactly why it's slow."
What is EXPLAIN?
When you run EXPLAIN, the database does not actually run your query. Instead, it analyzes the query and returns a report of how it plans to execute it.
It reveals:
The Scan Type: Is it a "Sequential Scan" (reading the whole table) or an "Index Scan" (using a shortcut)?
The Join Algorithm: Is it using a Hash Join, Merge Join, or Nested Loop?
Estimated Cost: An arbitrary number representing the "effort" (CPU and I/O) required.
Estimated Rows: How many rows the optimizer thinks it will process at each step.
Example Syntax:
EXPLAIN
SELECT * FROM taxi_trips
WHERE fare_amount > 100;What is EXPLAIN ANALYZE?
EXPLAIN ANALYZE is far more powerful. It actually executes the query, throws away the result set, and provides a report comparing the estimated costs against the actual time taken.
Actual Time: The real clock time (in milliseconds) each step took.
Actual Loops: How many times a specific operation had to be repeated.
Memory Usage: How much RAM was used for sorts or joins.
Example Syntax:
Key Differences: Estimate vs. Reality
Feature
EXPLAIN
EXPLAIN ANALYZE
Execution
Does not run the query.
Runs the query.
Side Effects
Safe to run on DELETE or UPDATE.
Will actually delete/update data!
Accuracy
Based on statistics (might be outdated).
100% accurate (real-world performance).
Information
Estimated cost/rows.
Actual time, memory, and disk I/O.
Warning: Never run
EXPLAIN ANALYZEon aDELETEorDROPstatement in production unless you wrap it in a transaction that you intend to rollback!
What to Look for in a Plan (Red Flags)
When reading the output of an Explain plan, keep an eye out for these performance killers:
Seq Scan (Sequential Scan): This means the database is reading every single row on the disk. If the table is large, you likely need an Index.
External Merge / Disk Sort: This happens when a
SORTorGROUP BYis too big for the allocated RAM (work_mem) and spills to the slow hard drive.Nested Loop: On large tables, this can be incredibly slow. It indicates the database is looping through Table B for every single row in Table A.
Row Mis-estimates: If
EXPLAINestimates 10 rows butANALYZEshows 1,000,000, your database statistics are stale, and the optimizer is making bad decisions based on old info.
Summary
EXPLAINis for planning and safety (checking if a query will be slow before running it).EXPLAIN ANALYZEis for profiling and debugging (seeing where the actual bottlenecks are).
Last updated