Airflow


Best practices for debugging and testing Airflow pipelines webinar from Astronomerarrow-up-right

https://airflow.apache.org/docs/arrow-up-right

Astronomer academy coursesarrow-up-right

Official How-to Guidesarrow-up-right


Airflow 3.x Components

In Airflow 3.x, the API Server is no longer just for the UI; it is the mandatory gatekeeper for the entire system.

  • The API Server (The Hub): In 3.x, this is the only component that talks to the Metadata DB during task execution. Workers/Tasks are now "DB-less."

  • The Metadata Database: Stores the state. Access is restricted to the Scheduler, DAG File Processor, and API Server.

  • The Scheduler: Focuses on the "State Machine." It decides when a task is SCHEDULED and moves it to the Executor.

  • The Executor: The strategy layer (Celery, Kubernetes, Local, or the new Edge Executor).

  • The DAG File Processor: Now often runs as a standalone, isolated process for better security, ensuring user code doesn't crash the Scheduler.

  • The Queue: Usually an external broker like Redis or RabbitMQ (for Celery) or a internal list (for Local).

  • The Worker / Task SDK: In 3.x, the worker uses the Airflow Task SDK. This allows the task to be "language-aware" and communicate status via JSON over HTTP to the API Server.

  • Triggerer:


Last updated