Physical joins
Physical joins are the actual algorithms that database engines use to execute join operations. While logical joins (INNER, LEFT, OUTER) describe what you want, physical joins determine how the database retrieves and combines the data.
Logical join = what you ask for Physical join = how the database actually performs it
The Three Main Physical Join Algorithms
1. Nested Loop Join
The simplest approach: for each row in the first table, scan through all rows in the second table looking for matches.
How it works:
Example:
If employees has 1,000 rows and departments has 10 rows, the database might scan departments 1,000 times (worst case: 10,000 comparisons).
Best for:
Small tables
When one table is tiny (the "inner" table)
When the outer table returns very few rows after filtering
Visual representation:
2. Hash Join
The database builds a hash table from one table, then probes it with rows from the other table.
How it works:
Example with same query:
Build phase (departments):
Probe phase (employees):
Best for:
Large tables with no indexes
Equality joins (=) only
When you have sufficient memory
When tables are similar in size
Performance: O(n + m) where n and m are table sizes
3. Merge Join (Sort-Merge Join)
Both tables are sorted by the join key, then merged together like a zipper.
How it works:
Visual representation:
Best for:
Tables already sorted (or indexed) on join columns
Large tables
Non-equality joins (>, <, BETWEEN)
When output needs to be sorted anyway
Performance: O(n log n + m log m) if sorting needed, O(n + m) if already sorted
Practical Example Comparison
Let's say you're joining orders (1 million rows) with customers (100,000 rows):
Query Optimizer's Decision Process:
Nested Loop: Used if the WHERE clause filters orders down to ~100 rows. Then it's efficient to loop through customers 100 times.
Hash Join: Used if no good indexes exist. Build hash table on customers (smaller table), probe with orders. Fast for large datasets.
Merge Join: Used if both tables have indexes on
customer_id. No sorting needed, just merge the sorted data streams.
How to See What Your Database Is Using
Look for keywords like:
"Nested Loop"
"Hash Join" or "Hash Match"
"Merge Join"
Key Takeaways
You don't choose physical joins directly—the query optimizer does
Indexes matter hugely: They can make nested loops efficient and enable merge joins
Statistics matter: The optimizer uses table statistics to choose the best algorithm
Table size matters: Small tables favor nested loops, large tables favor hash/merge
Memory matters: Hash joins need RAM; if unavailable, optimizer may choose differently
The optimizer considers costs like I/O operations, CPU usage, and memory before selecting the most efficient physical join for your query.
Last updated