One Big Table (OBT)

aka Wide Tables


A short video about itarrow-up-right


Wide Tables / One-Big-Table — What They Are

A Wide Table (also called One Big Table, Single Table, Flattened Table, or Denormalized Table) is a modeling strategy where you put almost all fields into a single, very wide, denormalized table, often with:

  • Hundreds or thousands of columns

  • Nested structures (JSON, arrays)

  • Sparse data (many nulls)

  • Very few joins — ideally none

The grain is usually something like:

  • order

  • customer-day

  • session

  • event

  • product snapshot

  • transaction

…and everything related to that grain is embedded directly into the same row.

This approach became popular in the era of cloud columnar databases.


Why Wide Tables Exist (Modern Reasons)

Cheap storage

In cloud data warehouses:

  • Storage is extremely cheap

  • Compute is separate from storage

  • It’s cheaper to store denormalized data than engineer a perfect model

This removes the historical pressure behind strict Kimball/Inmon modeling.


Columnar storage makes sparse data fast

Columnar engines (BigQuery, Snowflake, Redshift, Databricks) read only the columns used in a query.

If 95% of columns are null for most rows:

  • null columns cost almost nothing

  • column pruning makes scans fast

  • you don’t pay to read unused fields

This makes wide schemas efficient — something that relational row-based engines struggled with.


Flexible and evolving schemas

Modern systems produce:

  • JSON

  • events

  • semi-structured logs

  • rapidly-evolving fields

Adding new attributes constantly is common.

In columnar storage:

  • adding a column = metadata change

  • data for that column is stored only when written

This enables schema evolution with almost no overhead.


Fewer joins = faster queries

Joins are expensive:

  • shuffle in massive distributed systems

  • potential for skew

  • SQL complexity

  • operational overhead

A wide table removes joins entirely.

Analytics teams often prefer:

“Give me everything in one place and let me query.”

This often runs faster than joining 10–15 tables.


How Wide Tables Are Built

Typically you ingest raw data and then:

  • flatten nested JSON

  • join related entities (facts + dimensions)

  • explode arrays if necessary

  • produce a wide table keyed by a business grain

Over time, engineers append more fields based on new requirements:

Step
Action

1

Extract raw data

2

Add new columns as new data appears

3

Denormalize relationships

4

Keep building horizontally

5

End up with 200–2000 columns

This is incremental and emergent, not pre-modeled.


Example: Wide Table for Orders

Imagine a table with grain = (OrderID, CustomerID, OrderDate).

It may contain:

  • Customer name, email, demographics

  • Product info

  • Shipping address

  • Payment details

  • Discounts

  • Channel info

  • Loyalty tier

  • Derived metrics

  • Arrays of purchased items

  • Nested JSON from events

  • Flags, booleans, nulls

  • Metadata fields

  • Dozens of timestamps

You might end up with a table like:

This is perfectly valid in a cloud columnar system.


Strengths of Wide Tables

Extreme simplicity

  • Analysts can query one place

  • No complex joins

  • No dimensional modeling

  • No conformed dimensions

  • No integration headaches


Great performance for typical BI workloads

Analytical queries often access:

  • 5–20 columns

  • aggregate by date, customer, product

Columnar pruning makes this extremely efficient.


Perfect for:

  • event-driven systems

  • clickstream data

  • ecommerce order pipelines

  • log analysis

  • real-time pipelines

  • machine learning feature store–style queries

  • denormalized snapshots


Weaknesses and Criticisms

Loss of business logic

This is the big one.

Kimball insists on:

  • conformed dimensions

  • shared business definitions

  • carefully-modeled facts

Wide tables ignore this entirely.

This leads to:

  • duplicated logic

  • inconsistent metric definitions

  • unclear lineage

  • hard-to-govern schemas

  • ambiguous meaning of fields


Update performance

Updating nested fields or arrays can be slow:

  • replacing a nested array may rewrite a whole block

  • partial updates are often inefficient


Harder long-term maintenance

As columns grow:

  • documentation becomes unwieldy

  • quality varies

  • implicit relationships vanish

  • governance tools struggle

The table becomes a “junk drawer” for everything.


Hard to ensure correctness

Without modeling:

  • you don’t enforce integrity

  • business rules become scattered

  • analysts may interpret fields differently

This can cause metric drift.


When Wide Tables Make Sense

Use them when:

  • Speed > correctness

  • Agility > governance

  • Schema evolves rapidly

  • Data is semi-structured

  • You’re building prototypes or exploratory analytics

  • You need very fast BI queries

  • The organization is small or doesn’t have formal modeling resources

  • Data arrives from streaming sources or events

In many modern orgs, wide tables serve as the source of marts, not the final form.


When NOT to Use Them

Avoid them when:

  • You need conformed metrics

  • Integrity and governance matter

  • Many teams depend on consistent definitions

  • You have many changing business rules

  • You need traceability over time

  • Data is used for finance or regulatory reporting

For these, Kimball, Inmon, or DV2.0 → star schemas are safer.


Wide Table as Part of a Modern Stack

Typical pipeline:

Raw → Wide Table (flattened) → BI Views / Aggregates / ML Features

or

Data Vault → Wide Table (Consumption Layer)

or

Bronze → Silver (wide tables) → Gold (dimensional models) (in Lakehouse style)

Wide tables often serve as the middle / flexible zone.


Summary

Wide tables are:

  • denormalized

  • massive

  • sparse

  • columnar-friendly

  • schema-evolving

  • fast for analytics

  • low-governance

  • high-flexibility

  • low-logic

They represent the relaxed, cloud-era alternative to strict modeling.

They're incredibly useful — in the right contexts — but cannot replace rigorous modeling where consistency and governance matter.


Concrete Build Pipeline Example (Bronze → Silver → OBT → Gold)

Layer
Purpose
Storage / Example

Bronze

Raw ingestion, no modeling, immutable

Raw JSON/CSV in cloud storage, event logs

Silver

Cleaned, flattened, deduplicated data

Flattened Parquet/Delta tables

OBT / Wide Table

Denormalized, wide, query-ready table

Columnar table with 100–1000s of columns

Gold

Business-facing aggregates, BI dashboards, ML features

Star schema, aggregated views, dashboards

Step-by-Step Example:

  1. Bronze:

    • Raw ingestion from multiple sources.

    • Keep every field and nested structure.

    • Example: raw order events, JSON payloads.

  2. Silver:

    • Flatten JSON arrays, clean nulls, normalize timestamps.

    • Deduplicate events if necessary.

    • Example: customer order + item details flattened into single table row per item.

  3. Wide Table / OBT:

    • Denormalize facts + dimensions into one huge table.

    • Include customer info, order info, product info, campaign info, etc.

    • Columnar storage allows sparse columns and fast analytics.

  4. Gold:

    • Aggregate key metrics for dashboards.

    • Build star schemas or ML-ready feature tables.

    • Example: Sales by region, top products per customer segment.


Last updated