Accumulation
Accumulation: Managing Result Refinement
The Core Problem:
Because we are using Triggers, a single window (e.g., 12:00–12:05) might produce multiple outputs over time.
Output 1 (Early): Count = 10
Output 2 (On-Time): Count = 25
Output 3 (Late): Count = 28
The Question: How should the downstream system (the database or dashboard consuming this stream) interpret these numbers?
Should it add 25 to 10? (Total = 35? Wrong)
Should it replace 10 with 25? (Total = 25? Correct)
The Solution: Accumulation Modes
Accumulation defines the "semantics of refinement." It tells the system how the current result relates to the previous results for the same window.

The Three Accumulation Modes
A. Discarding (Deltas)
Logic: Every time the trigger fires, the system calculates the result for the new data only, emits it, and then throws away the state.
Message: "Here is a chunk of new data. You figure out the total."
Sequence:
10...15(meaning +15 new ones) ...3(meaning +3 late ones).Use Case: When the downstream system is an aggregator itself (e.g., sending data to a metric store that sums everything up).
B. Accumulating (Snapshots)
Logic: The system retains the state. Every time the trigger fires, it calculates the current total of everything seen so far.
Message: "The current total is now X." (Overwrite the old value).
Sequence:
10...25(10+15) ...28(25+3).Use Case: Overwrite-capable storage like Key/Value stores (BigTable, DynamoDB, Redis). You just
PUTthe new value, overwriting the old10with25.
C. Accumulating & Retracting (The "Undo" Button)
Logic: The system provides the new total, but also provides a "retraction" for the previous total.
Message: "I previously told you the total was 10. I was wrong. Subtract 10, and add 25."
Sequence:
Emit
10Emit
Retract -10, Emit25Emit
Retract -25, Emit28
Use Case: Complex downstream systems that perform grouping or summing (like SQL). If you just sent
25, a downstreamSUMoperation might add10 + 25 = 35(double counting). The retraction ensures the math stays clean:10 - 10 + 25 = 25.
Comparison Table: processing the sequence [1, 3, 2]
Imagine a window receives three events with values 1, 3, and 2. A trigger fires after every single event.
Event Arrives
Discarding Mode Output
Accumulating Mode Output
Acc. & Retracting Mode Output
Input: 1
1
1
1
Input: 3
3
4 (1+3)
-1, 4
Input: 2
2
6 (4+2)
-4, 6
Final View
Consumer must sum: 1+3+2=6
Consumer sees latest: 6
Consumer sums all: 1+(−1+4)+(−4+6)=6
Summary of the "What, Where, When, How" Framework
You have now documented the complete architectural solution for Unbounded Data:
What: Transformations (The Math).
Where: Event Time Windowing (The Buckets).
When: Watermarks (Completeness) + Triggers (Latency/Correctness).
How: Accumulation (Refinement Semantics).
This separates the Logical definition of your data pipeline from the Physical execution, solving the "Batch Coupling" problem we identified at the start.
Last updated