Streams and Tables


Streams and Tables (The Duality)

The Core Insight:

For decades, the industry treated "Streaming" and "Tables" (Databases) as two completely different animals.

  • Streaming: Moving data, transient, low latency (e.g., Kafka).

  • Tables: Static data, persistent, queryable (e.g., MySQL).

The book argues that they are not different things; they are the same thing viewed from a different angle. This is the Stream/Table Duality.

"Streams are tables in motion. Tables are streams at rest."

The Cycle of Life

To understand the duality, you must visualize how data moves between these two states.

A. Stream \rightarrow Table (Aggregation)

How do you turn a stream into a table? You group it.

  • Input: A Stream of raw events (User A clicked, User B clicked, User A clicked).

  • Operation: GROUP BY User and COUNT.

  • Output: A Table (User A: 2, User B: 1).

  • Theory: A table is just the current state of an aggregation over a stream.

B. Table \rightarrow Stream (Observation)

How do you turn a table into a stream? You observe the changes.

  • Input: A Table (MySQL Database).

  • Operation: Change Data Capture (CDC). Every time a row changes, emit a log.

  • Output: A Stream of updates (UPDATE User A SET Count = 3).

  • Theory: A stream is just the history of changes made to a table.

Why This Matters (The "Aha!" Moment)

Why does the book spend a whole chapter on this? Because it solves the "Streaming SQL" problem.

If you want to use SQL (a language designed for static Tables) on a Stream (moving data), you must understand this duality:

  1. Stream Processing is actually just calculating a Table that is constantly updating.

  2. Triggers (which we discussed earlier) are just the mechanism for deciding when to "broadcast" the Table's changes back out as a Stream.

The Grand Unification

This chapter unites the two worlds:

  • Batch Processing: You process the Stream up to a fixed point, build a Table, and stop.

  • Streaming Processing: You process the Stream continuously, keeping the Table live forever.

It is the same math. The only difference is that one stops and the other doesn't.

Summary

  • Duality: Streams and Tables are inverses of each other.

  • Integration: Aggregation turns Streams into Tables; Triggers turn Tables into Streams.

  • Implication: This theory allows us to run standard SQL queries on infinite data streams without changing the semantics of the language.


Last updated