One Big Table (OBT)

aka Wide Tables

Wide Tables / One-Big-Table — What They Are

A Wide Table (also called One Big Table, Single Table, Flattened Table, or Denormalized Table) is a modeling strategy where you put almost all fields into a single, very wide, denormalized table, often with:

Hundreds or thousands of columns
Nested structures (JSON, arrays)
Sparse data (many nulls)
Very few joins — ideally none

The grain is usually something like:

order
customer-day
session
event
product snapshot
transaction

…and everything related to that grain is embedded directly into the same row.

This approach became popular in the era of cloud columnar databases.

Why Wide Tables Exist (Modern Reasons)

Cheap storage

In cloud data warehouses:

Storage is extremely cheap
Compute is separate from storage
It’s cheaper to store denormalized data than engineer a perfect model

This removes the historical pressure behind strict Kimball/Inmon modeling.

Columnar storage makes sparse data fast

Columnar engines (BigQuery, Snowflake, Redshift, Databricks) read only the columns used in a query.

If 95% of columns are null for most rows:

null columns cost almost nothing
column pruning makes scans fast
you don’t pay to read unused fields

This makes wide schemas efficient — something that relational row-based engines struggled with.

Flexible and evolving schemas

Modern systems produce:

JSON
events
semi-structured logs
rapidly-evolving fields

Adding new attributes constantly is common.

In columnar storage:

adding a column = metadata change
data for that column is stored only when written

This enables schema evolution with almost no overhead.

Fewer joins = faster queries

Joins are expensive:

shuffle in massive distributed systems
potential for skew
SQL complexity
operational overhead

A wide table removes joins entirely.

Analytics teams often prefer:

“Give me everything in one place and let me query.”

This often runs faster than joining 10–15 tables.

How Wide Tables Are Built

Typically you ingest raw data and then:

flatten nested JSON
join related entities (facts + dimensions)
explode arrays if necessary
produce a wide table keyed by a business grain

Over time, engineers append more fields based on new requirements:

Step

Action

Extract raw data

Add new columns as new data appears

Denormalize relationships

Keep building horizontally

End up with 200–2000 columns

This is incremental and emergent, not pre-modeled.

Example: Wide Table for Orders

Imagine a table with grain = (OrderID, CustomerID, OrderDate).

It may contain:

Customer name, email, demographics
Product info
Shipping address
Payment details
Discounts
Channel info
Loyalty tier
Derived metrics
Arrays of purchased items
Nested JSON from events
Flags, booleans, nulls
Metadata fields
Dozens of timestamps

You might end up with a table like:

OrderID | CustomerID | OrderDate | TotalAmount | CustomerName | Email |
Phone | CampaignSource | Items (array) | AddressJSON | ... (hundreds of fields)

This is perfectly valid in a cloud columnar system.

Strengths of Wide Tables

Extreme simplicity

Analysts can query one place
No complex joins
No dimensional modeling
No conformed dimensions
No integration headaches

Great performance for typical BI workloads

Analytical queries often access:

5–20 columns
aggregate by date, customer, product

Columnar pruning makes this extremely efficient.

Perfect for:

event-driven systems
clickstream data
ecommerce order pipelines
log analysis
real-time pipelines
machine learning feature store–style queries
denormalized snapshots

Weaknesses and Criticisms

Loss of business logic

This is the big one.

Kimball insists on:

conformed dimensions
shared business definitions
carefully-modeled facts

Wide tables ignore this entirely.

This leads to:

duplicated logic
inconsistent metric definitions
unclear lineage
hard-to-govern schemas
ambiguous meaning of fields

Update performance

Updating nested fields or arrays can be slow:

replacing a nested array may rewrite a whole block
partial updates are often inefficient

Harder long-term maintenance

As columns grow:

documentation becomes unwieldy
quality varies
implicit relationships vanish
governance tools struggle

The table becomes a “junk drawer” for everything.

Hard to ensure correctness

Without modeling:

you don’t enforce integrity
business rules become scattered
analysts may interpret fields differently

This can cause metric drift.

When Wide Tables Make Sense

Use them when:

Speed > correctness
Agility > governance
Schema evolves rapidly
Data is semi-structured
You’re building prototypes or exploratory analytics
You need very fast BI queries
The organization is small or doesn’t have formal modeling resources
Data arrives from streaming sources or events

In many modern orgs, wide tables serve as the source of marts, not the final form.

When NOT to Use Them

Avoid them when:

You need conformed metrics
Integrity and governance matter
Many teams depend on consistent definitions
You have many changing business rules
You need traceability over time
Data is used for finance or regulatory reporting

For these, Kimball, Inmon, or DV2.0 → star schemas are safer.

Wide Table as Part of a Modern Stack

Typical pipeline:

Raw → Wide Table (flattened) → BI Views / Aggregates / ML Features

Data Vault → Wide Table (Consumption Layer)

Bronze → Silver (wide tables) → Gold (dimensional models) (in Lakehouse style)

Wide tables often serve as the middle / flexible zone.

Summary

Wide tables are:

denormalized
massive
sparse
columnar-friendly
schema-evolving
fast for analytics
low-governance
high-flexibility
low-logic

They represent the relaxed, cloud-era alternative to strict modeling.

They're incredibly useful — in the right contexts — but cannot replace rigorous modeling where consistency and governance matter.

Concrete Build Pipeline Example (Bronze → Silver → OBT → Gold)

Layer

Purpose

Storage / Example

Bronze

Raw ingestion, no modeling, immutable

Raw JSON/CSV in cloud storage, event logs

Silver

Cleaned, flattened, deduplicated data

Flattened Parquet/Delta tables

OBT / Wide Table

Denormalized, wide, query-ready table

Columnar table with 100–1000s of columns

Gold

Business-facing aggregates, BI dashboards, ML features

Star schema, aggregated views, dashboards

Step-by-Step Example:

Bronze:
- Raw ingestion from multiple sources.
- Keep every field and nested structure.
- Example: raw order events, JSON payloads.
Silver:
- Flatten JSON arrays, clean nulls, normalize timestamps.
- Deduplicate events if necessary.
- Example: customer order + item details flattened into single table row per item.
Wide Table / OBT:
- Denormalize facts + dimensions into one huge table.
- Include customer info, order info, product info, campaign info, etc.
- Columnar storage allows sparse columns and fast analytics.
Gold:
- Aggregate key metrics for dashboards.
- Build star schemas or ML-ready feature tables.
- Example: Sales by region, top products per customer segment.

PreviousCore concepts from the book "Building a Scalable Data Warehouse with Data Vault 2.0 "NextOLAP/Data cube

Last updated 2 months ago

hashtagWide Tables / One-Big-Table — What They Are

hashtagWhy Wide Tables Exist (Modern Reasons)

hashtagCheap storage

hashtagColumnar storage makes sparse data fast

hashtagFlexible and evolving schemas

hashtagFewer joins = faster queries

hashtagHow Wide Tables Are Built

hashtagExample: Wide Table for Orders

hashtagStrengths of Wide Tables

hashtagExtreme simplicity

hashtagGreat performance for typical BI workloads

hashtagPerfect for:

hashtagWeaknesses and Criticisms

hashtagLoss of business logic

hashtagUpdate performance

hashtagHarder long-term maintenance

hashtagHard to ensure correctness

hashtagWhen Wide Tables Make Sense

hashtagWhen NOT to Use Them

hashtagWide Table as Part of a Modern Stack

hashtagSummary

hashtagConcrete Build Pipeline Example (Bronze → Silver → OBT → Gold)