More about consistency guarantees

Consistency Guarantees in Streaming Systems

These guarantees describe what happens when messages flow through a distributed streaming pipeline where failures can occur at any point.

1. At-Most-Once Processing

Conceptual meaning: The message will be delivered zero or one times—never duplicated, but might be lost.

How it works:

The system acknowledges receipt of the message before processing it
If the processor crashes after acknowledging but before completing, the message is lost
No retries, no recovery mechanisms

Trade-offs:

✅ Lowest latency and overhead
✅ Simple to implement
❌ Data loss possible
❌ No durability guarantees

Real-world example: Imagine a sensor network sending temperature readings every second. If one reading is lost, it's acceptable because another will arrive soon. The cost of duplicate processing (e.g., triggering alerts twice) might be worse than missing occasional data points.

Systems using this:

Kafka (when using autocommit before processing)
UDP-based protocols
Metrics systems like StatsD (losing one metric point is acceptable)

2. At-Least-Once Processing

Conceptual meaning: The message will be delivered one or more times—guaranteed delivery, but duplicates are possible.

How it works:

Process the message first, then acknowledge
If the processor crashes before acknowledging, the message is redelivered
The source retries until it receives confirmation

Trade-offs:

✅ No message loss
✅ Simpler than exactly-once (no complex coordination)
❌ Downstream must handle duplicates
❌ Slightly higher latency

Real-world example: An e-commerce order processing system. If a payment processor crashes after charging the card but before acknowledging, the message gets retried. Without idempotency checks, the customer might be charged twice. This is why payment systems typically make charges idempotent (checking if a transaction ID was already processed).

Systems using this:

Kafka (with proper offset management and manual commits)
RabbitMQ (default behavior with acknowledgments)
AWS Kinesis
Apache Pulsar
Google Cloud Pub/Sub

3. Exactly-Once (Effectively-Once) Processing

Conceptual meaning: Each message produces exactly one effect in the output, even if it's processed multiple times internally.

You're absolutely correct that "exactly-once" refers to output semantics, not actual processing. The message might be processed multiple times, but through various techniques, only one result appears in the output.

How it works (multiple approaches):

A. Idempotent Operations

# Processing msg with ID=123 multiple times:
UPDATE accounts SET balance = 1000 WHERE id = 456
// Same result every time - idempotent

B. Transactional Coordination

Atomic commits: read input offset + write output + commit offset in one transaction
If any step fails, all roll back together
Kafka uses this with its transactional API

C. Distributed State + Checkpointing

System periodically snapshots the entire processing state
On failure, restore from last checkpoint and replay
Apache Flink's approach

D. Deduplication

Assign unique IDs to messages
Track processed IDs in a database
Skip already-processed messages

Trade-offs:

✅ Clean, predictable semantics
✅ Safe for non-idempotent operations (like incrementing counters)
❌ Higher latency (coordination overhead)
❌ More complex implementation
❌ Requires additional infrastructure (state stores, transaction logs)

Real-world example: A financial ledger system processing transactions. Each transaction must update the account balance exactly once. Using Flink with checkpointing: the system snapshots state every 30 seconds. If it crashes at second 45, it restores to second 30 and replays 15 seconds of data. Duplicate messages from replay are detected and ignored via transaction IDs.

Systems with exactly-once support:

Apache Flink - Uses distributed snapshots (Chandy-Lamport algorithm) + state backends
Kafka Streams - Transactional writes (read-process-write as atomic operation)
Google Cloud Dataflow - Built on exactly-once foundations
Spark Structured Streaming - Idempotent sinks + write-ahead logs
Apache Beam - Framework supports it when runners implement it (Flink, Dataflow)

Why "Effectively-Once" is the Better Term

The term "exactly-once" is somewhat misleading because:

Messages may be processed multiple times during retries or replay
What's guaranteed is the observable effect, not the processing count
It's impossible to guarantee that code literally executes once in a distributed system with failures

"Effectively-once" or "exactly-once semantics" better captures that we're talking about output correctness, not execution guarantees.

Choosing the Right Guarantee

Use at-most-once when:

Data loss is acceptable
Speed is critical
Processing duplicates is expensive or dangerous

Use at-least-once when:

You can make operations idempotent
Duplicates are easy to handle
You want good performance without extreme complexity

Use exactly-once when:

Operations aren't naturally idempotent (counters, aggregations)
Business logic requires precision (financial transactions)
You can accept the performance/complexity overhead

PreviousExactly-Once Processing NextStreams and Tables

Last updated 2 months ago

hashtagConsistency Guarantees in Streaming Systems

hashtag1. At-Most-Once Processing

hashtag2. At-Least-Once Processing

hashtag3. Exactly-Once (Effectively-Once) Processing

hashtagWhy "Effectively-Once" is the Better Term

hashtagChoosing the Right Guarantee

Consistency Guarantees in Streaming Systems

1. At-Most-Once Processing

2. At-Least-Once Processing

3. Exactly-Once (Effectively-Once) Processing

Why "Effectively-Once" is the Better Term

Choosing the Right Guarantee