Out-of-order events
What is Out-of-Order Data?
Simple Example
Events occur (event time):
Event A: 12:00:15
Event B: 12:00:30
Event C: 12:00:45
Event D: 12:01:00
Events arrive at system (processing time):
12:05:00 → Event B (12:00:30) ← Arrives first, but happened second
12:05:01 → Event D (12:01:00) ← Arrives second
12:05:02 → Event A (12:00:15) ← Arrives third, but happened FIRST!
12:05:03 → Event C (12:00:45) ← Arrives last
This is out-of-order!Why Does Out-of-Order Data Happen?
1. Network Variability
2. Mobile/Offline Devices
3. Distributed Systems / Multiple Data Centers
4. Clock Skew
5. Parallel Processing Pipelines
6. Retries and Replays
Visual Representation

Why Out-of-Order Data Is Problematic
Problem 1: Wrong Window Assignment (with Processing-Time Windows)
Problem 2: Incorrect Aggregations
Problem 3: Ordering-Dependent Operations Break
Problem 4: Impossible to Know "Completeness"
How Streaming Systems Handle Out-of-Order Data
Approach 1: Ignore It (Processing-Time Windows)
Approach 2: Buffer and Sort (Limited Reordering)
Approach 3: Watermarks + Late Data Handling (Modern Approach)
Takeaways from this
Last updated