Data Vault 2.0

Books:

Building a Scalable Data Warehouse with Data Vault 2.0
The Elephant in the Fridge: Guided Steps to Data Vault Success through Building Business-Centered Models, 2019

Data Vault 2.0 Modeling Basics | RedgateRedgate

Data Vault 2.0 — What It Really Is

Data Vault 2.0 is a methodology and data modeling pattern designed to handle large, rapidly changing, and unpredictable data ecosystems.

It is built for:

frequent schema changes
messy or inconsistent source systems
incremental data arrival (batch or streaming)
multi-source integration
auditability
parallel processing

The guiding philosophy:

Capture everything, change nothing, load fast. Business logic belongs at query time, not ingestion time.

This is the opposite of Kimball (logic in marts) and different from Inmon (logic in normalized EDW).

Why Data Vault?

Traditional warehouse designs break when:

sources change keys
attributes shift between tables
new fields appear
relationships change
entities merge or split
sources contradict each other
loads need to be parallelized

Data Vault is designed to survive all of this by splitting structure from attributes, which is the heart of the model.

Architecture of Data Vault 2.0

The Three Core Table Types

Hubs — the business keys

A hub anchors the model around a meaningful business identifier:

CustomerID
ProductID
OrderID
EmployeeID

Hubs contain:

Hash key (surrogate PK, calculated via MD5/SHA)
Business key (natural key)
Load date
Record source

Insert-only. Never updated.

They answer:

“What are the core business entities we track over time?”

Links — the relationships

Links capture associations between hubs:

Order ↔ Product
Customer ↔ Address
Employee ↔ Department

They also use hash keys and load metadata.

Insert-only. Never updated.

They answer:

“How do business keys relate, without embedding logic?”

This makes relationships extremely flexible when the business changes.

Satellites — the context / attributes

Satellites store the descriptive data about a hub or link:

Product name, category, price
Order status, channel
Customer demographic attributes

Contain:

Parent hash key
Load date
Record source
Attribute columns

Insert-only, historized. Old records remain. New attributes appear → simply create a new satellite.

They answer:

“What attributes did this business key have at this time, from this source?”

How You Query a Data Vault

Raw Data Vault is not convenient for analytics.

To get a usable dataset, you build a Business Vault or information mart on top:

Join hub
Join links
Join satellites
Apply business rules
Produce a dimensional model or view

Data Vault stores the raw truth; marts represent business truth.

Comparing Data Vault, Kimball, and Inmon

Inmon

Enterprise-wide integrated 3NF warehouse
Strong integration rules
Metadata-heavy
High initial cost
Hard to change

Kimball

Dimensional model (facts & dimensions)
Optimized for BI/analytics
Business logic defined early
ETL must conform data on load

Data Vault

Flexible
Parallel-load friendly
Schema-change friendly
Minimal business logic up front
Requires post-processing for usable analytics

Ideal modern pattern: Raw ingestion → Data Vault → Star schemas (marts)

Modern Architecture Fit (Warehouse / Lakehouse)

Data Vault 2.0 works extremely well with:

BigQuery
Snowflake
Databricks
Redshift
Synapse

DNS (“Do Nothing Stupid”) rule of DV2.0 = Use the platform strengths: micro-batches, streams, ELT.

Worked Example — Orders & Products

We’ll take your scenario:

Products
Orders
Order ↔ Product relationship

Step 1 — Build Hubs

HubProduct HubOrder

Tables contain hash keys, business keys, load metadata.

Example:

HubProduct:

ProductHashKey | LoadDate     | RecordSource | ProductID
----------------------------------------------------------------
4041fd8...     | 2020-01-02   | ERP          | 1
de84355...     | 2021-03-09   | ERP          | 2
cf27369...     | 2021-03-09   | ERP          | 3

HubOrder:

OrderHashKey   | LoadDate     | RecordSource | OrderID
----------------------------------------------------------------
f89913...      | 2022-03-01   | Website      | 100
38b3ef...      | 2022-03-01   | Website      | 101
ec8956...      | 2022-03-01   | Website      | 102

Step 2 — Build a Link

LinkOrderProduct connects orders to products:

OrderProductHashKey | LoadDate | RecordSource | ProductHashKey | OrderHashKey
------------------------------------------------------------------------------
ff64ec1...          | 2022-03-01 | Website     | 4041fd8...     | f89913...
ff64ec1...          | 2022-03-01 | Website     | de84355...     | f89913...
e232628...          | 2022-03-01 | Website     | cf27369...     | 38b3ef...
26166a5...          | 2022-03-01 | Website     | 4041fd8...     | ec8956...

Step 3 — Add Satellites

Attributes about products go into SatelliteProduct:

ProductHashKey | LoadDate     | RecordSource | ProductName      | Price
-------------------------------------------------------------------------
4041fd8...     | 2020-01-02   | ERP          | Thingamajig      | 50
de84355...     | 2021-03-09   | ERP          | Whatchamacallit  | 25
cf27369...     | 2021-03-09   | ERP          | Whozeewhatzit    | 75

New attributes (e.g., category, weight) would simply go into another satellite.

How You Turn This Into a Star Schema

To build analytics tables:

Fact table

join LinkOrderProduct → join HubOrder → join HubProduct

Dimensions

ProductDim: join HubProduct with SatelliteProduct OrderDim: join HubOrder with its satellites (if exist)

Final BI schema looks like Kimball, but sourced from Data Vault.

When to Use Data Vault 2.0

Ideal when:

Many source systems
Constant, unpredictable change
Need traceability / auditability
Want ingestion speed above all
ELT tools and cloud warehouse available
Governance and lineage matter

Not ideal when:

You need a simple, fast BI warehouse
The business is small and stable
Modeling overhead needs to be minimal

PreviousComparison of Kimball architecture to its alternatives NextCore concepts from the book "Building a Scalable Data Warehouse with Data Vault 2.0 "

Last updated 1 month ago

hashtagData Vault 2.0 — What It Really Is

hashtagWhy Data Vault?

hashtagThe Three Core Table Types

hashtagHubs — the business keys

hashtagLinks — the relationships

hashtagSatellites — the context / attributes

hashtagHow You Query a Data Vault

hashtagComparing Data Vault, Kimball, and Inmon

hashtagInmon

hashtagKimball

hashtagData Vault

hashtagModern Architecture Fit (Warehouse / Lakehouse)

hashtagWorked Example — Orders & Products

hashtagStep 1 — Build Hubs

hashtagStep 2 — Build a Link

hashtagStep 3 — Add Satellites

hashtagHow You Turn This Into a Star Schema

hashtagFact table

hashtagDimensions

hashtagWhen to Use Data Vault 2.0

Data Vault 2.0 — What It Really Is

Why Data Vault?

The Three Core Table Types

Hubs — the business keys

Links — the relationships

Satellites — the context / attributes

How You Query a Data Vault

Comparing Data Vault, Kimball, and Inmon

Inmon

Kimball

Data Vault

Modern Architecture Fit (Warehouse / Lakehouse)

Worked Example — Orders & Products

Step 1 — Build Hubs

Step 2 — Build a Link

Step 3 — Add Satellites

How You Turn This Into a Star Schema

Fact table

Dimensions

When to Use Data Vault 2.0