Dataflow
Dataflow
Dataflow (Apache Beam)
What it is: The processing engine that reads from Pub/Sub.
The Philosophy: "Unified Batch and Stream." You write your code once (using the Apache Beam SDK). You can run that same code on a streaming source (Pub/Sub) or a batch source (Cloud Storage files).
Data Engineer Note: This is Google's pride and joy. It handles "late data" (e.g., a mobile phone loses signal and sends data 1 hour late) better than almost any other tool.
AWS Equivalent: Managed Flink (for streaming) or Glue (for batch).
Last updated