More about consistency guarantees
Consistency Guarantees in Streaming Systems
These guarantees describe what happens when messages flow through a distributed streaming pipeline where failures can occur at any point.
1. At-Most-Once Processing
Conceptual meaning: The message will be delivered zero or one times—never duplicated, but might be lost.
How it works:
The system acknowledges receipt of the message before processing it
If the processor crashes after acknowledging but before completing, the message is lost
No retries, no recovery mechanisms
Trade-offs:
✅ Lowest latency and overhead
✅ Simple to implement
❌ Data loss possible
❌ No durability guarantees
Real-world example: Imagine a sensor network sending temperature readings every second. If one reading is lost, it's acceptable because another will arrive soon. The cost of duplicate processing (e.g., triggering alerts twice) might be worse than missing occasional data points.
Systems using this:
Kafka (when using autocommit before processing)
UDP-based protocols
Metrics systems like StatsD (losing one metric point is acceptable)
2. At-Least-Once Processing
Conceptual meaning: The message will be delivered one or more times—guaranteed delivery, but duplicates are possible.
How it works:
Process the message first, then acknowledge
If the processor crashes before acknowledging, the message is redelivered
The source retries until it receives confirmation
Trade-offs:
✅ No message loss
✅ Simpler than exactly-once (no complex coordination)
❌ Downstream must handle duplicates
❌ Slightly higher latency
Real-world example: An e-commerce order processing system. If a payment processor crashes after charging the card but before acknowledging, the message gets retried. Without idempotency checks, the customer might be charged twice. This is why payment systems typically make charges idempotent (checking if a transaction ID was already processed).
Systems using this:
Kafka (with proper offset management and manual commits)
RabbitMQ (default behavior with acknowledgments)
AWS Kinesis
Apache Pulsar
Google Cloud Pub/Sub
3. Exactly-Once (Effectively-Once) Processing
Conceptual meaning: Each message produces exactly one effect in the output, even if it's processed multiple times internally.
You're absolutely correct that "exactly-once" refers to output semantics, not actual processing. The message might be processed multiple times, but through various techniques, only one result appears in the output.
How it works (multiple approaches):
A. Idempotent Operations
B. Transactional Coordination
Atomic commits: read input offset + write output + commit offset in one transaction
If any step fails, all roll back together
Kafka uses this with its transactional API
C. Distributed State + Checkpointing
System periodically snapshots the entire processing state
On failure, restore from last checkpoint and replay
Apache Flink's approach
D. Deduplication
Assign unique IDs to messages
Track processed IDs in a database
Skip already-processed messages
Trade-offs:
✅ Clean, predictable semantics
✅ Safe for non-idempotent operations (like incrementing counters)
❌ Higher latency (coordination overhead)
❌ More complex implementation
❌ Requires additional infrastructure (state stores, transaction logs)
Real-world example: A financial ledger system processing transactions. Each transaction must update the account balance exactly once. Using Flink with checkpointing: the system snapshots state every 30 seconds. If it crashes at second 45, it restores to second 30 and replays 15 seconds of data. Duplicate messages from replay are detected and ignored via transaction IDs.
Systems with exactly-once support:
Apache Flink - Uses distributed snapshots (Chandy-Lamport algorithm) + state backends
Kafka Streams - Transactional writes (read-process-write as atomic operation)
Google Cloud Dataflow - Built on exactly-once foundations
Spark Structured Streaming - Idempotent sinks + write-ahead logs
Apache Beam - Framework supports it when runners implement it (Flink, Dataflow)
Why "Effectively-Once" is the Better Term
The term "exactly-once" is somewhat misleading because:
Messages may be processed multiple times during retries or replay
What's guaranteed is the observable effect, not the processing count
It's impossible to guarantee that code literally executes once in a distributed system with failures
"Effectively-once" or "exactly-once semantics" better captures that we're talking about output correctness, not execution guarantees.
Choosing the Right Guarantee
Use at-most-once when:
Data loss is acceptable
Speed is critical
Processing duplicates is expensive or dangerous
Use at-least-once when:
You can make operations idempotent
Duplicates are easy to handle
You want good performance without extreme complexity
Use exactly-once when:
Operations aren't naturally idempotent (counters, aggregations)
Business logic requires precision (financial transactions)
You can accept the performance/complexity overhead
Last updated