GROUP BY and Aggregations


Aggregations are SQL operations that combine multiple rows of data into a single summary value. Instead of looking at individual records, you use aggregations to find totals, averages, or counts across a specific group.


Core Aggregate Functions

These functions perform a calculation on a set of values and return a single result.

Function

What it does

Data Type

COUNT()

Counts the number of rows or non-null values.

Any

SUM()

Adds up all values in a column.

Numeric

AVG()

Calculates the arithmetic mean.

Numeric

MIN()

Finds the smallest value.

Numeric, Date, String

MAX()

Finds the largest value.

Numeric, Date, String

Example:

If you have a table of sales, you can find the total revenue and the number of items sold with one query:

SELECT 
    SUM(price) AS total_revenue,
    COUNT(order_id) AS total_orders
FROM sales_table;

The GROUP BY Clause

Aggregations are most powerful when used with GROUP BY. This tells the database to "bucket" the data before performing the calculation.

The Golden Rule: Every column in your SELECT list that is not inside an aggregate function must appear in the GROUP BY clause.

Example:

To see the total sales per specific product category:


Filtering Aggregates with HAVING

You cannot use a WHERE clause to filter by the results of an aggregation. This is because WHERE filters individual rows before they are grouped. To filter the results after they are grouped, you use HAVING.

Example:

If you only want to see categories that have generated more than $5,000 in revenue:


Aggregating Unique Values (DISTINCT)

Sometimes you don't want to count every row; you only want to count the number of unique items. You can use the DISTINCT keyword inside the aggregate function.

Example:

To find out how many unique customers placed an order, rather than the total number of orders:


Summary Reference Table

Keyword

Purpose

GROUP BY

Groups rows that have the same values into summary rows.

HAVING

Filters groups based on an aggregate property (e.g., SUM > 100).

DISTINCT

Used inside a function to ignore duplicate values.


Last updated