Physical joins


Video explainerarrow-up-right


Physical joins are the actual algorithms that database engines use to execute join operations. While logical joins (INNER, LEFT, OUTER) describe what you want, physical joins determine how the database retrieves and combines the data.

Logical join = what you ask for Physical join = how the database actually performs it

The Three Main Physical Join Algorithms

1. Nested Loop Join

The simplest approach: for each row in the first table, scan through all rows in the second table looking for matches.

How it works:

Example:

If employees has 1,000 rows and departments has 10 rows, the database might scan departments 1,000 times (worst case: 10,000 comparisons).

Best for:

  • Small tables

  • When one table is tiny (the "inner" table)

  • When the outer table returns very few rows after filtering

Visual representation:


2. Hash Join

The database builds a hash table from one table, then probes it with rows from the other table.

How it works:

Example with same query:

Build phase (departments):

Probe phase (employees):

Best for:

  • Large tables with no indexes

  • Equality joins (=) only

  • When you have sufficient memory

  • When tables are similar in size

Performance: O(n + m) where n and m are table sizes


3. Merge Join (Sort-Merge Join)

Both tables are sorted by the join key, then merged together like a zipper.

How it works:

Visual representation:

Best for:

  • Tables already sorted (or indexed) on join columns

  • Large tables

  • Non-equality joins (>, <, BETWEEN)

  • When output needs to be sorted anyway

Performance: O(n log n + m log m) if sorting needed, O(n + m) if already sorted


Practical Example Comparison

Let's say you're joining orders (1 million rows) with customers (100,000 rows):

Query Optimizer's Decision Process:

  1. Nested Loop: Used if the WHERE clause filters orders down to ~100 rows. Then it's efficient to loop through customers 100 times.

  2. Hash Join: Used if no good indexes exist. Build hash table on customers (smaller table), probe with orders. Fast for large datasets.

  3. Merge Join: Used if both tables have indexes on customer_id. No sorting needed, just merge the sorted data streams.


How to See What Your Database Is Using

Look for keywords like:

  • "Nested Loop"

  • "Hash Join" or "Hash Match"

  • "Merge Join"

Key Takeaways

  • You don't choose physical joins directly—the query optimizer does

  • Indexes matter hugely: They can make nested loops efficient and enable merge joins

  • Statistics matter: The optimizer uses table statistics to choose the best algorithm

  • Table size matters: Small tables favor nested loops, large tables favor hash/merge

  • Memory matters: Hash joins need RAM; if unavailable, optimizer may choose differently

The optimizer considers costs like I/O operations, CPU usage, and memory before selecting the most efficient physical join for your query.

Last updated