Filtering: WHERE clause
The WHERE Clause
WHERE ClauseThe WHERE clause is used to filter records before any grouping or aggregation occurs. It acts as a gatekeeper, determining which rows are pulled from the disk into memory.
Comparison Operators
Operator
Description
Example
=
Exact match
WHERE status = 'active'
!= or <>
Not equal to
WHERE category != 'test'
>, <, >=, <=
Range comparisons
WHERE price >= 100
BETWEEN
Inclusive range
WHERE age BETWEEN 18 AND 25
IN
Match any in a list
WHERE region IN ('US', 'CA', 'UK')
IS NULL
Check for empty values
WHERE email IS NULL
Advanced Pattern Matching
Data is often messy. You won't always have an exact match.
LIKE: Case-sensitive pattern matching.%= Multi-character wildcard._= Single-character wildcard.
ILIKE: (Postgres Specific) Case-insensitive version ofLIKE.SIMILAR TO: Uses SQL regular expressions.~: Uses POSIX Regular Expressions (the most powerful).
Handling NULLs: The Tri-state Logic
In SQL, NULL does not mean "zero" or "empty string"—it means unknown. This leads to Three-Valued Logic: TRUE, FALSE, and UNKNOWN.
The Trap:
NULL = NULLis notTRUE. It isUNKNOWN.
To filter safely, you must use IS NULL or IS NOT NULL. Alternatively, use COALESCE to provide a default value during the filter.
Logical Operators & Precedence
You can combine filters using AND, OR, and NOT.
Crucial Note on Parentheses: AND has higher precedence than OR. Always use parentheses to avoid "Leaky Filters."
Performance: SARGable Queries
In Data Engineering, we care about SARGable (Search ARGumentable) queries. This means writing filters in a way that allows the Query Optimizer to use an index.
Non-SARGable:
WHERE YEAR(created_at) = 2024(The database has to run a function on every single row, forcing a Table Scan).SARGable:
WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01'(The database can use a B-Tree Index on thecreated_atcolumn directly).
Filtering with DISTINCT
DISTINCTWhile not technically a WHERE clause, DISTINCT is a filter that removes duplicate rows from your result set.
Postgres Superpower:
DISTINCT ON (column). It returns the first row for each group of the specified column.
Summary
WHEREfilters rows;HAVINGfilters groups.COALESCEis vital for filtering columns that containNULL.Avoid functions on columns in the
WHEREclause if you want to keep your queries fast (SARGable).ILIKEand~(Regex) make Postgres much more flexible for string filtering than standard SQL.
Last updated