OLAP/Data cube
What is an OLAP Cube?
An OLAP (Online Analytical Processing) Cube, or Data Cube, is a data structure optimized for very fast data analysis.
Think of a standard Excel spreadsheet: it is two-dimensional (rows and columns). You might have Products as rows and Months as columns, with sales figures in the cells.
Now, imagine adding a third dimension, like Region (North, South, East, West). You can’t easily fit this into a single flat sheet; you would need a stack of sheets. If you add a fourth dimension (e.g., Sales Channel), it becomes even harder to visualize.
An OLAP Cube solves this by pre-calculating and storing data in multi-dimensional arrays. This allows you to answer questions like "What were the sales of Laptops (Product) in New York (Location) during December (Time)?" almost instantly, without the database having to scan millions of rows every time.
Key Components
Dimensions: The "edges" of the cube that define perspective (e.g., Time, Geography, Product, Customer).
Measures: The numerical data inside the cube cells (e.g., Revenue, Units Sold, Profit).
OLAP Operations (Slice, Dice, and More)
Analysts navigate these cubes using specific "operations" to change their view of the data.
1. Slice
"One dimension is fixed."
A slice filters the cube by a single dimension, effectively cutting a 2D "slice" out of the 3D cube.
Action: You lock one dimension to a specific value.
Example: "Show me sales for all products in all regions, but ONLY for the year 2023."
Result: You get a flat 2D page of data (Products vs. Regions).
2. Dice
"Multiple dimensions are constrained."
A dice selects a specific sub-cube by filtering on two or more dimensions.
Action: You restrict the range of multiple dimensions simultaneously.
Example: "Show me sales ONLY for Laptops and Phones, ONLY in the USA and Canada, for Q1 and Q2."
Result: You get a smaller, more focused 3D cube (a sub-cube) rather than a 2D slice.
3. Drill Down
"Zoom In (More Detail)."
Navigating from summary data to detailed data.
Example: You are looking at Annual sales. You "drill down" to see Quarterly sales, and then Monthly sales.
Tech view: Moving down the concept hierarchy (Year Month) or adding a new dimension (Country City).
4. Roll Up
"Zoom Out (Less Detail)."
The opposite of drill down. Aggregating detailed data into a summary.
Example: You are looking at Daily sales. You "roll up" to see the total Monthly sales.
Tech view: Moving up the concept hierarchy or removing a dimension.
5. Pivot (Rotate)
"Change the perspective."
Rotating the cube to view the data from a different angle.
Example: Your report shows Regions on the X-axis and Time on the Y-axis. You "pivot" to swap them, so now Time is on the X-axis and Regions are on the Y-axis.
Summary Table
Operation
Concept
Output Shape
Analogy
Slice
Filter on 1 dimension
2D Plane
Pulling one slice of bread from a loaf.
Dice
Filter on 2+ dimensions
Smaller 3D Cube
Cutting a small block of cheese from a big block.
Drill Down
Increase detail
More Granular
Zooming in on a map.
Roll Up
Decrease detail
Less Granular
Zooming out on a map.
This video provides a visual walkthrough of these specific operations (Slice, Dice, Roll-up, Drill-down), which is often easier to understand than text alone when dealing with multi-dimensional concepts.
Relationship between OLAP cube and Start Schema
Here is how the conceptual "3D Cube" maps to the physical Star Schema and how you execute those OLAP operations using SQL.
1. The Star Schema vs. The Cube
In modern Data Warehousing (especially the Kimball methodology you might be familiar with), we rarely build physical proprietary "Cubes" anymore. Instead, we use a Star Schema in a relational database to logically represent the cube.
Fact Table = The Cells: The center of the star. It holds the metrics (Measures) like SalesAmount or Quantity.
Dimension Tables = The Edges: The points of the star. They hold the context (Dimensions) like DimDate, DimProduct, and DimStore.
When you join a Fact table to its Dimensions, you essentially "construct" the cube at query time.
2. Translating Concepts to SQL
Let's imagine a simple Star Schema:
FactSales:
DateKey,ProductKey,StoreKey,SalesAmountDimDate:
DateKey,Year,Quarter,MonthDimProduct:
ProductKey,Category,ProductNameDimStore:
StoreKey,Region,City
A. Slice (Filter on 1 Dimension)
SQL Equivalent: A WHERE clause filtering on a single dimension column.
Concept: "Show me data for just the year 2023."
SQL:
B. Dice (Filter on 2+ Dimensions)
SQL Equivalent: A WHERE clause with multiple AND conditions across different dimensions.
Concept: "Show me data for 2023 (Time) AND Electronics (Product)."
SQL:
C. Drill Down (Add Granularity)
SQL Equivalent: Adding columns to the SELECT list and the GROUP BY clause.
Concept: You are looking at Annual sales and want to see Monthly sales.
SQL Transformation:
Before (High Level):
GROUP BY d.YearAfter (Drill Down):
GROUP BY d.Year, d.Month
D. Roll Up (Remove Granularity)
SQL Equivalent: Removing columns from the GROUP BY clause OR using the ROLLUP operator.
Concept: You are looking at City sales and want to zoom out to Region totals.
SQL (Standard): Simply remove
Cityfrom theGROUP BY.SQL (Advanced): Using
ROLLUPto get both the detailed rows and the sub-totals in one result.
E. Pivot (Rotate)
SQL Equivalent: Conditional Aggregation (Case Statements) or PIVOT function.
SQL naturally returns data in rows (vertical). To "pivot" it so that values become column headers (horizontal), we usually use CASE statements.
Concept: Compare sales of Years (rows) side-by-side.
SQL:
OLAP Concept
SQL Component
Cube
Fact Table + JOIN Dimensions
Slice
WHERE (One condition)
Dice
WHERE (Multiple AND conditions)
Drill Down
Add columns to GROUP BY
Roll Up
Remove columns from GROUP BY or use ROLLUP()
Pivot
CASE WHEN... or PIVOT() function
Modern cube
Here is the breakdown of how "the Cube" survived the modernization of the data stack and where tools like dbt and SQLMesh fit in.
The Shift: Physical vs. Logical Cubes
The Old Way (Physical Cubes / MOLAP)
In the 90s/00s (e.g., Microsoft SSAS, Cognos), you had to physically build a cube.
Process: You ran a nightly job that read your SQL tables and pre-calculated every possible combination of aggregations (Sum of Sales by Year, by Month, by City, by Product, etc.).
Result: A proprietary binary file.
Pros: Fast reads.
Cons: "Building the cube" took hours. If you needed real-time data or had massive data volume, the cube build would fail.
The Modern Way (Logical Cubes / ROLAP)
Today, with fast cloud warehouses (Snowflake, BigQuery, Databricks), we don't need to pre-calculate everything physically.
Process: You keep the data in a Star Schema (Relational tables).
The "Cube": It is now a code definition (a Semantic Layer) that tells the tool: "Here is how you join these tables, and here is how you calculate Revenue."
Result: When a user "Slices" by City, the engine generates a fresh SQL query on the fly (or uses a smart cache).
Verdict: The "Cube" concept exists as a user experience (Slice/Dice), but the storage is just standard database tables.
Where dbt and SQLMesh fit?
dbt and SQLMesh are primarily Transformation Tools, not "Cube Serving" tools. However, they are the critical first step in the "Modern Cube" pipeline.
A. The Builders: dbt & SQLMesh
Their job is to build the Star Schema (Fact and Dimension tables) inside your Data Warehouse.
What they do: They take raw messy data and clean it into the
FactSalesandDimProducttables we discussed earlier.Role: They create the foundation for the cube. Without clean Fact/Dim tables, you can't have a cube.
Recent shift: dbt introduced a "Semantic Layer" feature. This allows you to define metrics (e.g.,
sum(sales)) directly in your dbt code. This moves dbt closer to being a "Cube definition" tool, but its core strength is still building the tables.
B. The "Modern Cube" Tools (Semantic Layers)
These are the tools that actually strictly implement the "Cube" logic (Dimensions, Measures, Drill-downs) on top of the tables built by dbt/SQLMesh.
Cube (formerly Cube.js):
Literally named after the concept.
It sits between your database and your app.
You define the "Cube" in JavaScript/YAML files (Measures, Dimensions, Joins).
It handles the caching and SQL generation (Slice/Dice) for you. It connects to the tables dbt created.
Looker (LookML):
Uses a language called LookML to define dimensions and measures.
It treats the underlying database like a virtual cube.
AtScale:
Enterprise tool that lets you design a cube visually, but leaves the data in the warehouse (Virtualization).
The Modern "Cube" Workflow
Here is how a Data Engineer (like the role you are targeting) typically sets this up today:
Stage
Action
Tool Example
1. Storage
Store raw data
S3, ADLS (Data Lake)
2. Compute/Warehousing
Process data
Spark, Snowflake, BigQuery
3. Modeling (The Star)
Build Fact/Dim Tables
dbt, SQLMesh
4. Semantic Layer (The Cube)
Define Metrics & Joins
Cube.dev, Looker, dbt Semantic Layer
5. Presentation
Slice & Dice Visualization
Tableau, PowerBI, React App
Is it still a cube? Yes, conceptually. We still say "building a cube" or "designing the cube" to refer to the multi-dimensional model, even if it's not a physical file anymore.
dbt/SQLMesh: They build the tables (Star Schema) that the cube reads from.
Semantic Layer: This is the software that provides the "Cube Experience" (API) to the end users.
Last updated
