# Dataflow

***

#### Dataflow

**Dataflow (Apache Beam)**

* What it is: The processing engine that reads from Pub/Sub.
* The Philosophy: "Unified Batch and Stream." You write your code once (using the Apache Beam SDK). You can run that same code on a streaming source (Pub/Sub) or a batch source (Cloud Storage files).
* Data Engineer Note: This is Google's pride and joy. It handles "late data" (e.g., a mobile phone loses signal and sends data 1 hour late) better than almost any other tool.
* AWS Equivalent: Managed Flink (for streaming) or Glue (for batch).

***
