Are workflow orchestrators used with streaming?
Workflow orchestrators (Airflow, Prefect, Dagster, Step Functions) are not usually used to run streaming jobs, but they are used to start, monitor, deploy, or manage them.
Here’s the breakdown.
✅ Why Orchestrators Are NOT Usually Used to Run Streaming Jobs
Streaming jobs (Flink, Spark Structured Streaming, Kafka Streams) are:
long-running
continuously running (24/7)
event-driven, not schedule-driven
stateful (continuous checkpoints, watermarks)
not meant to be killed and restarted every few minutes
Workflow orchestrators are designed for:
Batch tasks
Finite jobs
Retry → success/failure → exit
DAG dependencies
These assumptions don’t match streaming workloads.
Example:
Airflow expects tasks to:
But a streaming job behaves like:
Airflow will time out, retry, fail the DAG, or screw up state management.
✅ So How Do People Manage Streaming Jobs Instead?
They use streaming-native orchestration and cluster managers, such as:
Flink’s built-in JobManager
Submits jobs
Manages checkpoints
Restarts with exactly-once guarantees
Spark Structured Streaming on Databricks / EMR / Kubernetes
Jobs run as long-lived services
Checkpoint directories restore state
Cluster managers handle failures, scaling
Kafka Streams
Self-managed inside the client library
Scaling = add/remove instances
Failover built-in
Cloud-native streaming services
Google Dataflow (Apache Beam)
AWS Kinesis Data Analytics
Azure Stream Analytics These are managed streaming runtimes.
These systems handle:
auto-restarts
scaling
state recovery
continuous execution
Workflow orchestrators cannot replace these.
✅ When Orchestrators ARE Used with Streaming Jobs
Even though orchestrators do not run streaming jobs, they do orchestrate around them.
✔ Deploying a streaming job
Ship new JAR/py file to cluster
Submit job to Flink/Spark/Kubernetes
Trigger rolling upgrade
✔ Monitoring
DAG tasks ping job health API
Alert if the job is not running
✔ Bootstrapping
Run a preparation step:
create topics
create checkpoint buckets
upload configs
✔ Periodic side jobs
Checkpoint cleanup
Daily table optimizations
Partition compaction
Metric aggregation
✔ Reprocessing jobs
Orchestrator may trigger:
“run this job over historical data”
“backfill from S3 for last 30 days”
✔ Cleanup / Stop / Resume
Orchestrator helps coordinate operations, but doesn't run the core stream itself.
✅ Real-World Example
Spark Structured Streaming pipeline
Streaming service: runs 24/7 on Databricks jobs or Kubernetes
Airflow orchestration:
Deploy new version of streaming code
Trigger restart with zero downtime
Run daily maintenance (optimize Delta tables)
Trigger a historical backfill job (batch)
Flink pipeline
Flink JobManager handles the actual stream
Airflow deploys/updates the job, monitors metrics
Kafka Streams application
Runs as microservice
Prefect monitors health and handles restarts
Last updated