OLTP/OLAP systems

Storage Layout: Row vs. Column

The physical orientation of data on disk determines which access patterns are "cheap" and which are "expensive."

Row-Oriented (Horizontal Partitioning)

Physical Layout: Fields of the same record are stored contiguously in a single page.
- Example: [ID1, Name1, Age1], [ID2, Name2, Age2]
Best For: Point Lookups (Accessing a specific record by ID) and Heavy Writes (Appending a full record).
Advantages: * One I/O seek retrieves the entire record.
- Simple indexing (Primary Keys point to a specific file offset).
Disadvantages: * "I/O Bloat": To calculate AVG(age), the DB must read Name and ID into memory even though they aren't needed.
- Lower compression due to mixed data types in a single block.

Column-Oriented (Vertical Partitioning)

Physical Layout: Values of the same column are stored together across pages.
- Example: Names: [Alice, Bob]; Ages: [30, 25]
Best For: Aggregations (SUM, AVG) and Broad Scans on a subset of columns.
Advantages:
- High Compression: Storing same-type data allows for Run-Length Encoding (RLE) or Dictionary Encoding.
- Vectorized Processing: Modern CPUs can process arrays of values (SIMD) much faster than mixed-type rows.
- I/O Efficiency: Only the specific columns requested in the SELECT clause are read from disk.
Disadvantages:
- Tuple Reconstruction: Combining columns to show a full record is computationally expensive.
- Write Penalty: Inserting one record requires updating multiple separate column files/pages.

System Archetypes: OLTP vs. OLAP

These storage methods serve two fundamentally different business needs.

OLTP (Online Transaction Processing)

Storage: Row-oriented.
Workload: Thousands of small, fast updates/reads (e.g., "Change Order Status," "Get User Profile").
Schema: Normalized (3NF) to ensure data integrity and minimize redundancy.
Focus: Low latency and high concurrency (ACID is non-negotiable).

OLAP (Online Analytical Processing)

Storage: Column-oriented.
Workload: Few, massive queries scanning millions of rows (e.g., "Year-over-year revenue by region").
Schema: Denormalized (Star/Snowflake) to reduce the number of joins.
Focus: Throughput and complex query performance.

Data Modeling for Analytics

When moving data from OLTP to OLAP, we reshape it for easier querying.

Star Schema: A central Fact Table (quantitative metrics) surrounded by Dimension Tables (descriptive attributes). This is the gold standard for performance.
Snowflake Schema: A variation where dimensions are normalized. It saves space but increases join complexity.
HTAP (Hybrid Transactional/Analytical Processing): Modern systems (like TiDB or SingleStore) that maintain both a row-store and a column-store of the same data simultaneously, allowing "real-time analytics" without an ETL delay.

Summary Table for Quick Reference

Feature

OLTP / Row-Store

OLAP / Column-Store

Primary Goal

Execution of transactions

Business Intelligence / Insights

Data Update

Frequent, small writes

Batch loads, rare updates

Query Pattern

Simple (JOINs on PKs)

Complex (Aggregations, Window Functions)

Compression

Moderate

Exceptional (10x - 100x)

Bottleneck

Disk Latency / Contention

Disk I/O Bandwidth / CPU

PreviousMetadata in DBMS NextVectorized Query Execution

Last updated 28 days ago

hashtagStorage Layout: Row vs. Column

hashtagRow-Oriented (Horizontal Partitioning)

hashtagColumn-Oriented (Vertical Partitioning)

hashtagSystem Archetypes: OLTP vs. OLAP

hashtagOLTP (Online Transaction Processing)

hashtagOLAP (Online Analytical Processing)

hashtagData Modeling for Analytics

hashtagSummary Table for Quick Reference