More about consistency guarantees

Consistency Guarantees in Streaming Systems

These guarantees describe what happens when messages flow through a distributed streaming pipeline where failures can occur at any point.

1. At-Most-Once Processing

Conceptual meaning: The message will be delivered zero or one times—never duplicated, but might be lost.

How it works:

  • The system acknowledges receipt of the message before processing it

  • If the processor crashes after acknowledging but before completing, the message is lost

  • No retries, no recovery mechanisms

Trade-offs:

  • ✅ Lowest latency and overhead

  • ✅ Simple to implement

  • ❌ Data loss possible

  • ❌ No durability guarantees

Real-world example: Imagine a sensor network sending temperature readings every second. If one reading is lost, it's acceptable because another will arrive soon. The cost of duplicate processing (e.g., triggering alerts twice) might be worse than missing occasional data points.

Systems using this:

  • Kafka (when using autocommit before processing)

  • UDP-based protocols

  • Metrics systems like StatsD (losing one metric point is acceptable)


2. At-Least-Once Processing

Conceptual meaning: The message will be delivered one or more times—guaranteed delivery, but duplicates are possible.

How it works:

  • Process the message first, then acknowledge

  • If the processor crashes before acknowledging, the message is redelivered

  • The source retries until it receives confirmation

Trade-offs:

  • ✅ No message loss

  • ✅ Simpler than exactly-once (no complex coordination)

  • ❌ Downstream must handle duplicates

  • ❌ Slightly higher latency

Real-world example: An e-commerce order processing system. If a payment processor crashes after charging the card but before acknowledging, the message gets retried. Without idempotency checks, the customer might be charged twice. This is why payment systems typically make charges idempotent (checking if a transaction ID was already processed).

Systems using this:

  • Kafka (with proper offset management and manual commits)

  • RabbitMQ (default behavior with acknowledgments)

  • AWS Kinesis

  • Apache Pulsar

  • Google Cloud Pub/Sub


3. Exactly-Once (Effectively-Once) Processing

Conceptual meaning: Each message produces exactly one effect in the output, even if it's processed multiple times internally.

You're absolutely correct that "exactly-once" refers to output semantics, not actual processing. The message might be processed multiple times, but through various techniques, only one result appears in the output.

How it works (multiple approaches):

A. Idempotent Operations

B. Transactional Coordination

  • Atomic commits: read input offset + write output + commit offset in one transaction

  • If any step fails, all roll back together

  • Kafka uses this with its transactional API

C. Distributed State + Checkpointing

  • System periodically snapshots the entire processing state

  • On failure, restore from last checkpoint and replay

  • Apache Flink's approach

D. Deduplication

  • Assign unique IDs to messages

  • Track processed IDs in a database

  • Skip already-processed messages

Trade-offs:

  • ✅ Clean, predictable semantics

  • ✅ Safe for non-idempotent operations (like incrementing counters)

  • ❌ Higher latency (coordination overhead)

  • ❌ More complex implementation

  • ❌ Requires additional infrastructure (state stores, transaction logs)

Real-world example: A financial ledger system processing transactions. Each transaction must update the account balance exactly once. Using Flink with checkpointing: the system snapshots state every 30 seconds. If it crashes at second 45, it restores to second 30 and replays 15 seconds of data. Duplicate messages from replay are detected and ignored via transaction IDs.

Systems with exactly-once support:

  • Apache Flink - Uses distributed snapshots (Chandy-Lamport algorithm) + state backends

  • Kafka Streams - Transactional writes (read-process-write as atomic operation)

  • Google Cloud Dataflow - Built on exactly-once foundations

  • Spark Structured Streaming - Idempotent sinks + write-ahead logs

  • Apache Beam - Framework supports it when runners implement it (Flink, Dataflow)


Why "Effectively-Once" is the Better Term

The term "exactly-once" is somewhat misleading because:

  1. Messages may be processed multiple times during retries or replay

  2. What's guaranteed is the observable effect, not the processing count

  3. It's impossible to guarantee that code literally executes once in a distributed system with failures

"Effectively-once" or "exactly-once semantics" better captures that we're talking about output correctness, not execution guarantees.


Choosing the Right Guarantee

Use at-most-once when:

  • Data loss is acceptable

  • Speed is critical

  • Processing duplicates is expensive or dangerous

Use at-least-once when:

  • You can make operations idempotent

  • Duplicates are easy to handle

  • You want good performance without extreme complexity

Use exactly-once when:

  • Operations aren't naturally idempotent (counters, aggregations)

  • Business logic requires precision (financial transactions)

  • You can accept the performance/complexity overhead


Last updated