OLTP/OLAP systems
Storage Layout: Row vs. Column
The physical orientation of data on disk determines which access patterns are "cheap" and which are "expensive."
Row-Oriented (Horizontal Partitioning)
Physical Layout: Fields of the same record are stored contiguously in a single page.
Example:
[ID1, Name1, Age1], [ID2, Name2, Age2]
Best For: Point Lookups (Accessing a specific record by ID) and Heavy Writes (Appending a full record).
Advantages: * One I/O seek retrieves the entire record.
Simple indexing (Primary Keys point to a specific file offset).
Disadvantages: * "I/O Bloat": To calculate
AVG(age), the DB must readNameandIDinto memory even though they aren't needed.Lower compression due to mixed data types in a single block.
Column-Oriented (Vertical Partitioning)
Physical Layout: Values of the same column are stored together across pages.
Example:
Names: [Alice, Bob]; Ages: [30, 25]
Best For: Aggregations (SUM, AVG) and Broad Scans on a subset of columns.
Advantages:
High Compression: Storing same-type data allows for Run-Length Encoding (RLE) or Dictionary Encoding.
Vectorized Processing: Modern CPUs can process arrays of values (SIMD) much faster than mixed-type rows.
I/O Efficiency: Only the specific columns requested in the
SELECTclause are read from disk.
Disadvantages:
Tuple Reconstruction: Combining columns to show a full record is computationally expensive.
Write Penalty: Inserting one record requires updating multiple separate column files/pages.
System Archetypes: OLTP vs. OLAP
These storage methods serve two fundamentally different business needs.
OLTP (Online Transaction Processing)
Storage: Row-oriented.
Workload: Thousands of small, fast updates/reads (e.g., "Change Order Status," "Get User Profile").
Schema: Normalized (3NF) to ensure data integrity and minimize redundancy.
Focus: Low latency and high concurrency (ACID is non-negotiable).
OLAP (Online Analytical Processing)
Storage: Column-oriented.
Workload: Few, massive queries scanning millions of rows (e.g., "Year-over-year revenue by region").
Schema: Denormalized (Star/Snowflake) to reduce the number of joins.
Focus: Throughput and complex query performance.
Data Modeling for Analytics
When moving data from OLTP to OLAP, we reshape it for easier querying.
Star Schema: A central Fact Table (quantitative metrics) surrounded by Dimension Tables (descriptive attributes). This is the gold standard for performance.
Snowflake Schema: A variation where dimensions are normalized. It saves space but increases join complexity.
HTAP (Hybrid Transactional/Analytical Processing): Modern systems (like TiDB or SingleStore) that maintain both a row-store and a column-store of the same data simultaneously, allowing "real-time analytics" without an ETL delay.
Summary Table for Quick Reference
Feature
OLTP / Row-Store
OLAP / Column-Store
Primary Goal
Execution of transactions
Business Intelligence / Insights
Data Update
Frequent, small writes
Batch loads, rare updates
Query Pattern
Simple (JOINs on PKs)
Complex (Aggregations, Window Functions)
Compression
Moderate
Exceptional (10x - 100x)
Bottleneck
Disk Latency / Contention
Disk I/O Bandwidth / CPU
Last updated