OLAP/Data cube

What's an OLAP cube? 🎲 - Analytics Engineers ClubAnalytics Engineers Club

Data cubes | Metabase LearnMetabase | Business Intelligence, Dashboards, and Data Visualization

What is an OLAP Cube?

An OLAP (Online Analytical Processing) Cube, or Data Cube, is a data structure optimized for very fast data analysis.

Think of a standard Excel spreadsheet: it is two-dimensional (rows and columns). You might have Products as rows and Months as columns, with sales figures in the cells.

Now, imagine adding a third dimension, like Region (North, South, East, West). You can’t easily fit this into a single flat sheet; you would need a stack of sheets. If you add a fourth dimension (e.g., Sales Channel), it becomes even harder to visualize.

An OLAP Cube solves this by pre-calculating and storing data in multi-dimensional arrays. This allows you to answer questions like "What were the sales of Laptops (Product) in New York (Location) during December (Time)?" almost instantly, without the database having to scan millions of rows every time.

Key Components

Dimensions: The "edges" of the cube that define perspective (e.g., Time, Geography, Product, Customer).
Measures: The numerical data inside the cube cells (e.g., Revenue, Units Sold, Profit).

OLAP Operations (Slice, Dice, and More)

Analysts navigate these cubes using specific "operations" to change their view of the data.

1. Slice

"One dimension is fixed."

A slice filters the cube by a single dimension, effectively cutting a 2D "slice" out of the 3D cube.

Action: You lock one dimension to a specific value.
Example: "Show me sales for all products in all regions, but ONLY for the year 2023."
Result: You get a flat 2D page of data (Products vs. Regions).

2. Dice

"Multiple dimensions are constrained."

A dice selects a specific sub-cube by filtering on two or more dimensions.

Action: You restrict the range of multiple dimensions simultaneously.
Example: "Show me sales ONLY for Laptops and Phones, ONLY in the USA and Canada, for Q1 and Q2."
Result: You get a smaller, more focused 3D cube (a sub-cube) rather than a 2D slice.

3. Drill Down

"Zoom In (More Detail)."

Navigating from summary data to detailed data.

Example: You are looking at Annual sales. You "drill down" to see Quarterly sales, and then Monthly sales.
Tech view: Moving down the concept hierarchy (Year Month) or adding a new dimension (Country City).

4. Roll Up

"Zoom Out (Less Detail)."

The opposite of drill down. Aggregating detailed data into a summary.

Example: You are looking at Daily sales. You "roll up" to see the total Monthly sales.
Tech view: Moving up the concept hierarchy or removing a dimension.

5. Pivot (Rotate)

"Change the perspective."

Rotating the cube to view the data from a different angle.

Example: Your report shows Regions on the X-axis and Time on the Y-axis. You "pivot" to swap them, so now Time is on the X-axis and Regions are on the Y-axis.

Summary Table

Operation

Concept

Output Shape

Analogy

Slice

Filter on 1 dimension

2D Plane

Pulling one slice of bread from a loaf.

Dice

Filter on 2+ dimensions

Smaller 3D Cube

Cutting a small block of cheese from a big block.

Drill Down

Increase detail

More Granular

Zooming in on a map.

Roll Up

Decrease detail

Less Granular

Zooming out on a map.

This video provides a visual walkthrough of these specific operations (Slice, Dice, Roll-up, Drill-down), which is often easier to understand than text alone when dealing with multi-dimensional concepts.

Relationship between OLAP cube and Start Schema

Here is how the conceptual "3D Cube" maps to the physical Star Schema and how you execute those OLAP operations using SQL.

1. The Star Schema vs. The Cube

In modern Data Warehousing (especially the Kimball methodology you might be familiar with), we rarely build physical proprietary "Cubes" anymore. Instead, we use a Star Schema in a relational database to logically represent the cube.

Fact Table = The Cells: The center of the star. It holds the metrics (Measures) like SalesAmount or Quantity.
Dimension Tables = The Edges: The points of the star. They hold the context (Dimensions) like DimDate, DimProduct, and DimStore.

When you join a Fact table to its Dimensions, you essentially "construct" the cube at query time.

2. Translating Concepts to SQL

Let's imagine a simple Star Schema:

FactSales: DateKey, ProductKey, StoreKey, SalesAmount
DimDate: DateKey, Year, Quarter, Month
DimProduct: ProductKey, Category, ProductName
DimStore: StoreKey, Region, City

A. Slice (Filter on 1 Dimension)

SQL Equivalent: A WHERE clause filtering on a single dimension column.

Concept: "Show me data for just the year 2023."

SQL:

SELECT 
    p.Category,
    SUM(f.SalesAmount) as TotalRevenue
FROM FactSales f
JOIN DimDate d ON f.DateKey = d.DateKey
JOIN DimProduct p ON f.ProductKey = p.ProductKey
WHERE d.Year = 2023  -- <--- The Slice
GROUP BY p.Category;

B. Dice (Filter on 2+ Dimensions)

SQL Equivalent: A WHERE clause with multiple AND conditions across different dimensions.

Concept: "Show me data for 2023 (Time) AND Electronics (Product)."

SQL:

SELECT 
    s.Region,
    SUM(f.SalesAmount) as TotalRevenue
FROM FactSales f
JOIN DimDate d ON f.DateKey = d.DateKey
JOIN DimProduct p ON f.ProductKey = p.ProductKey
JOIN DimStore s ON f.StoreKey = s.StoreKey
WHERE d.Year = 2023              -- <--- Dice Condition 1
  AND p.Category = 'Electronics' -- <--- Dice Condition 2
GROUP BY s.Region;

C. Drill Down (Add Granularity)

SQL Equivalent: Adding columns to the SELECT list and the GROUP BY clause.

Concept: You are looking at Annual sales and want to see Monthly sales.

SQL Transformation:

Before (High Level): GROUP BY d.Year
After (Drill Down): GROUP BY d.Year, d.Month

SELECT 
    d.Year,
    d.Month, -- <--- Added for Drill Down
    SUM(f.SalesAmount) as TotalRevenue
FROM FactSales f
JOIN DimDate d ON f.DateKey = d.DateKey
GROUP BY d.Year, d.Month; -- <--- Added for Drill Down

D. Roll Up (Remove Granularity)

SQL Equivalent: Removing columns from the GROUP BY clause OR using the ROLLUP operator.

Concept: You are looking at City sales and want to zoom out to Region totals.
SQL (Standard): Simply remove City from the GROUP BY.

SQL (Advanced): Using ROLLUP to get both the detailed rows and the sub-totals in one result.

SELECT 
    s.Region,
    s.City,
    SUM(f.SalesAmount) as TotalRevenue
FROM FactSales f
JOIN DimStore s ON f.StoreKey = s.StoreKey
-- This calculates total for City, THEN total for Region, THEN Grand Total
GROUP BY ROLLUP(s.Region, s.City);

E. Pivot (Rotate)

SQL Equivalent: Conditional Aggregation (Case Statements) or PIVOT function.

SQL naturally returns data in rows (vertical). To "pivot" it so that values become column headers (horizontal), we usually use CASE statements.

Concept: Compare sales of Years (rows) side-by-side.

SQL:

SELECT 
    p.Category,
    SUM(CASE WHEN d.Year = 2022 THEN f.SalesAmount ELSE 0 END) as Sales_2022,
    SUM(CASE WHEN d.Year = 2023 THEN f.SalesAmount ELSE 0 END) as Sales_2023
FROM FactSales f
JOIN DimDate d ON f.DateKey = d.DateKey
JOIN DimProduct p ON f.ProductKey = p.ProductKey
GROUP BY p.Category;

OLAP Concept

SQL Component

Cube

Fact Table + JOIN Dimensions

Slice

WHERE (One condition)

Dice

WHERE (Multiple AND conditions)

Drill Down

Add columns to GROUP BY

Roll Up

Remove columns from GROUP BY or use ROLLUP()

Pivot

CASE WHEN... or PIVOT() function

Modern cube

Here is the breakdown of how "the Cube" survived the modernization of the data stack and where tools like dbt and SQLMesh fit in.

The Shift: Physical vs. Logical Cubes

The Old Way (Physical Cubes / MOLAP)

In the 90s/00s (e.g., Microsoft SSAS, Cognos), you had to physically build a cube.

Process: You ran a nightly job that read your SQL tables and pre-calculated every possible combination of aggregations (Sum of Sales by Year, by Month, by City, by Product, etc.).
Result: A proprietary binary file.
Pros: Fast reads.
Cons: "Building the cube" took hours. If you needed real-time data or had massive data volume, the cube build would fail.

The Modern Way (Logical Cubes / ROLAP)

Today, with fast cloud warehouses (Snowflake, BigQuery, Databricks), we don't need to pre-calculate everything physically.

Process: You keep the data in a Star Schema (Relational tables).
The "Cube": It is now a code definition (a Semantic Layer) that tells the tool: "Here is how you join these tables, and here is how you calculate Revenue."
Result: When a user "Slices" by City, the engine generates a fresh SQL query on the fly (or uses a smart cache).
Verdict: The "Cube" concept exists as a user experience (Slice/Dice), but the storage is just standard database tables.

Where dbt and SQLMesh fit?

dbt and SQLMesh are primarily Transformation Tools, not "Cube Serving" tools. However, they are the critical first step in the "Modern Cube" pipeline.

A. The Builders: dbt & SQLMesh

Their job is to build the Star Schema (Fact and Dimension tables) inside your Data Warehouse.

What they do: They take raw messy data and clean it into the FactSales and DimProduct tables we discussed earlier.
Role: They create the foundation for the cube. Without clean Fact/Dim tables, you can't have a cube.
Recent shift: dbt introduced a "Semantic Layer" feature. This allows you to define metrics (e.g., sum(sales)) directly in your dbt code. This moves dbt closer to being a "Cube definition" tool, but its core strength is still building the tables.

B. The "Modern Cube" Tools (Semantic Layers)

These are the tools that actually strictly implement the "Cube" logic (Dimensions, Measures, Drill-downs) on top of the tables built by dbt/SQLMesh.

Cube (formerly Cube.js):
- Literally named after the concept.
- It sits between your database and your app.
- You define the "Cube" in JavaScript/YAML files (Measures, Dimensions, Joins).
- It handles the caching and SQL generation (Slice/Dice) for you. It connects to the tables dbt created.
Looker (LookML):
- Uses a language called LookML to define dimensions and measures.
- It treats the underlying database like a virtual cube.
AtScale:
- Enterprise tool that lets you design a cube visually, but leaves the data in the warehouse (Virtualization).

The Modern "Cube" Workflow

Here is how a Data Engineer (like the role you are targeting) typically sets this up today:

Stage

Action

Tool Example

1. Storage

Store raw data

S3, ADLS (Data Lake)

2. Compute/Warehousing

Process data

Spark, Snowflake, BigQuery

3. Modeling (The Star)

Build Fact/Dim Tables

dbt, SQLMesh

4. Semantic Layer (The Cube)

Define Metrics & Joins

Cube.dev, Looker, dbt Semantic Layer

5. Presentation

Slice & Dice Visualization

Tableau, PowerBI, React App

Is it still a cube? Yes, conceptually. We still say "building a cube" or "designing the cube" to refer to the multi-dimensional model, even if it's not a physical file anymore.
dbt/SQLMesh: They build the tables (Star Schema) that the cube reads from.
Semantic Layer: This is the software that provides the "Cube Experience" (API) to the end users.

PreviousOne Big Table (OBT)NextBatch architectures

Last updated 1 month ago

hashtagWhat is an OLAP Cube?

hashtagKey Components

hashtagOLAP Operations (Slice, Dice, and More)

hashtagSummary Table

hashtagRelationship between OLAP cube and Start Schema

hashtag1. The Star Schema vs. The Cube

hashtag2. Translating Concepts to SQL

hashtagModern cube

hashtagThe Shift: Physical vs. Logical Cubes

hashtagWhere dbt and SQLMesh fit?

hashtagThe Modern "Cube" Workflow

What is an OLAP Cube?

Key Components

OLAP Operations (Slice, Dice, and More)

Summary Table

Relationship between OLAP cube and Start Schema

1. The Star Schema vs. The Cube

2. Translating Concepts to SQL

Modern cube

The Shift: Physical vs. Logical Cubes

Where dbt and SQLMesh fit?

The Modern "Cube" Workflow