Streaming SQL
Streaming SQL
The Core Premise:
For years, people tried to invent "Stream Processing Languages" (CQL, Esper, etc.). They all failed to gain mass adoption.
The book argues that we don't need a new language. SQL—if correctly applied—is actually the perfect language for streaming.
Why? because SQL is Declarative.
Imperative (Java/Python): You tell the computer how to process the data (loop here, wait for watermark there).
Declarative (SQL): You tell the computer what you want (the result), and the engine figures out the mechanics (state, windowing, triggering).
The "Dynamic Table" (Flink's Core Model)
This is the mental leap you need to make.
Standard SQL: Executes a query once against a snapshot of data, produces a static result, and quits.
Streaming SQL: Executes a Continuous Query against a table that is constantly changing.
The Concept: A query on a stream produces a Dynamic Table. This table is updated continuously as new events arrive.
Time-Varying Relations
In standard SQL, if you run SELECT COUNT(*) FROM Clicks, you get a number (e.g., 10).
In Streaming SQL, that same query produces a stream of updates:
At 12:00:
10At 12:01:
11At 12:02:
12
The relationship between the input and the output varies over time.
Windowing in SQL
This is where the concepts we discussed earlier (Windowing Types) get mapped to actual code. The SQL standard (SQL:2016) was extended to support these "Group By Window" operations.
A. Tumbling Window (Fixed)
Instead of just GROUP BY User, you group by the window function:
B. Hopping Window (Sliding)
Watermarks in SQL
You might wonder: How does SQL know about Event Time?
In Flink SQL, when you define the table (CREATE TABLE), you explicitly define the Watermark strategy in the DDL (Data Definition Language).
This tells the SQL engine: "Use the event_time column as the clock, and expect up to 5 seconds of lag."
Stream-Stream Joins (The Hardest Part)
The chapter spends significant time on Joins. Joining two infinite streams is tricky.
Interval Joins: "Join Click with View IF the Click happened within 10 minutes of the View."
SQL Syntax:
State implication: The engine knows it only needs to keep
Viewsin memory for 10 minutes. After that, it can drop them (State Cleanup), keeping the footprint small.
Summary
Model: Stream → Dynamic Table → Continuous Query → Result Stream.
Syntax: Uses standard SQL with temporal extensions (
TUMBLE,HOP,SESSION).Execution: The engine (Flink) automatically handles State, Checkpointing, and Watermarks based on your declarative query.
Power: This democratizes streaming. You don't need to be a Java engineer to build a complex, exactly-once pipeline; you just need to know SQL.
Last updated