One Big Table (OBT)
aka Wide Tables
Wide Tables / One-Big-Table — What They Are
A Wide Table (also called One Big Table, Single Table, Flattened Table, or Denormalized Table) is a modeling strategy where you put almost all fields into a single, very wide, denormalized table, often with:
Hundreds or thousands of columns
Nested structures (JSON, arrays)
Sparse data (many nulls)
Very few joins — ideally none
The grain is usually something like:
order
customer-day
session
event
product snapshot
transaction
…and everything related to that grain is embedded directly into the same row.
This approach became popular in the era of cloud columnar databases.
Why Wide Tables Exist (Modern Reasons)
Cheap storage
In cloud data warehouses:
Storage is extremely cheap
Compute is separate from storage
It’s cheaper to store denormalized data than engineer a perfect model
This removes the historical pressure behind strict Kimball/Inmon modeling.
Columnar storage makes sparse data fast
Columnar engines (BigQuery, Snowflake, Redshift, Databricks) read only the columns used in a query.
If 95% of columns are null for most rows:
null columns cost almost nothing
column pruning makes scans fast
you don’t pay to read unused fields
This makes wide schemas efficient — something that relational row-based engines struggled with.
Flexible and evolving schemas
Modern systems produce:
JSON
events
semi-structured logs
rapidly-evolving fields
Adding new attributes constantly is common.
In columnar storage:
adding a column = metadata change
data for that column is stored only when written
This enables schema evolution with almost no overhead.
Fewer joins = faster queries
Joins are expensive:
shuffle in massive distributed systems
potential for skew
SQL complexity
operational overhead
A wide table removes joins entirely.
Analytics teams often prefer:
“Give me everything in one place and let me query.”
This often runs faster than joining 10–15 tables.
How Wide Tables Are Built
Typically you ingest raw data and then:
flatten nested JSON
join related entities (facts + dimensions)
explode arrays if necessary
produce a wide table keyed by a business grain
Over time, engineers append more fields based on new requirements:
1
Extract raw data
2
Add new columns as new data appears
3
Denormalize relationships
4
Keep building horizontally
5
End up with 200–2000 columns
This is incremental and emergent, not pre-modeled.
Example: Wide Table for Orders
Imagine a table with grain = (OrderID, CustomerID, OrderDate).
It may contain:
Customer name, email, demographics
Product info
Shipping address
Payment details
Discounts
Channel info
Loyalty tier
Derived metrics
Arrays of purchased items
Nested JSON from events
Flags, booleans, nulls
Metadata fields
Dozens of timestamps
You might end up with a table like:
This is perfectly valid in a cloud columnar system.
Strengths of Wide Tables
Extreme simplicity
Analysts can query one place
No complex joins
No dimensional modeling
No conformed dimensions
No integration headaches
Great performance for typical BI workloads
Analytical queries often access:
5–20 columns
aggregate by date, customer, product
Columnar pruning makes this extremely efficient.
Perfect for:
event-driven systems
clickstream data
ecommerce order pipelines
log analysis
real-time pipelines
machine learning feature store–style queries
denormalized snapshots
Weaknesses and Criticisms
Loss of business logic
This is the big one.
Kimball insists on:
conformed dimensions
shared business definitions
carefully-modeled facts
Wide tables ignore this entirely.
This leads to:
duplicated logic
inconsistent metric definitions
unclear lineage
hard-to-govern schemas
ambiguous meaning of fields
Update performance
Updating nested fields or arrays can be slow:
replacing a nested array may rewrite a whole block
partial updates are often inefficient
Harder long-term maintenance
As columns grow:
documentation becomes unwieldy
quality varies
implicit relationships vanish
governance tools struggle
The table becomes a “junk drawer” for everything.
Hard to ensure correctness
Without modeling:
you don’t enforce integrity
business rules become scattered
analysts may interpret fields differently
This can cause metric drift.
When Wide Tables Make Sense
Use them when:
Speed > correctness
Agility > governance
Schema evolves rapidly
Data is semi-structured
You’re building prototypes or exploratory analytics
You need very fast BI queries
The organization is small or doesn’t have formal modeling resources
Data arrives from streaming sources or events
In many modern orgs, wide tables serve as the source of marts, not the final form.
When NOT to Use Them
Avoid them when:
You need conformed metrics
Integrity and governance matter
Many teams depend on consistent definitions
You have many changing business rules
You need traceability over time
Data is used for finance or regulatory reporting
For these, Kimball, Inmon, or DV2.0 → star schemas are safer.
Wide Table as Part of a Modern Stack
Typical pipeline:
Raw → Wide Table (flattened) → BI Views / Aggregates / ML Features
or
Data Vault → Wide Table (Consumption Layer)
or
Bronze → Silver (wide tables) → Gold (dimensional models) (in Lakehouse style)
Wide tables often serve as the middle / flexible zone.
Summary
Wide tables are:
denormalized
massive
sparse
columnar-friendly
schema-evolving
fast for analytics
low-governance
high-flexibility
low-logic
They represent the relaxed, cloud-era alternative to strict modeling.
They're incredibly useful — in the right contexts — but cannot replace rigorous modeling where consistency and governance matter.
Concrete Build Pipeline Example (Bronze → Silver → OBT → Gold)
Bronze
Raw ingestion, no modeling, immutable
Raw JSON/CSV in cloud storage, event logs
Silver
Cleaned, flattened, deduplicated data
Flattened Parquet/Delta tables
OBT / Wide Table
Denormalized, wide, query-ready table
Columnar table with 100–1000s of columns
Gold
Business-facing aggregates, BI dashboards, ML features
Star schema, aggregated views, dashboards
Step-by-Step Example:
Bronze:
Raw ingestion from multiple sources.
Keep every field and nested structure.
Example: raw order events, JSON payloads.
Silver:
Flatten JSON arrays, clean nulls, normalize timestamps.
Deduplicate events if necessary.
Example: customer order + item details flattened into single table row per item.
Wide Table / OBT:
Denormalize facts + dimensions into one huge table.
Include customer info, order info, product info, campaign info, etc.
Columnar storage allows sparse columns and fast analytics.
Gold:
Aggregate key metrics for dashboards.
Build star schemas or ML-ready feature tables.
Example: Sales by region, top products per customer segment.
Last updated