UNION and UNION ALL


In SQL, UNION and UNION ALL are used to combine the result sets of two or more SELECT statements into a single output.

Think of it as stacking one result set on top of another.


Key Requirements

For either command to work, your queries must follow these rules:

  • Each SELECT statement must have the same number of columns.

  • The columns must be in the same order.

  • The data types of corresponding columns must be compatible.


UNION ALL

UNION ALL combines all rows from both queries exactly as they are. If a record exists in both tables, it will appear twice in your final result.

When to use it

  • When you know there are no duplicates.

  • When you want to see duplicates.

  • Performance: Use this by default if you don't care about duplicates. It is much faster because the database doesn't have to scan the data to remove identical rows.

-- Combining two similar tables
SELECT city, zip_code FROM east_coast_customers
UNION ALL
SELECT city, zip_code FROM west_coast_customers;

UNION

UNION combines the result sets but removes duplicate rows. It keeps only unique records.

When to use it

  • When you want a clean list of unique values across multiple sources.

  • Performance Warning: It is slower than UNION ALL because the database must perform a "distinct" operation (sorting and comparing) to find and delete duplicates.


Comparison Table

Feature

UNION ALL

UNION

Duplicates

Keeps all duplicates.

Removes duplicates.

Performance

Faster (No extra processing).

Slower (Requires sorting/de-duping).

Result Size

Larger (Total sum of all rows).

Smaller (Only unique rows).

Common Use

Merging logs or raw staging data.

Creating unique master lists.


Adding a "Source" Column

A common Data Engineering practice when using these commands is to add a hardcoded string to identify where each row came from. This is incredibly helpful for debugging.


Last updated