Using Variables
In Airflow, Variables are a generic way to store and retrieve arbitrary content or settings as key-value pairs within the Airflow metadata database. They are essentially global constants that you can manage through the UI, CLI, or code.
What are Variables used for?
Variables are typically used for configuration data that is runtime-dependent but doesn't change frequently. Common use cases include:
File Paths: Storing the base path for your data lake or landing zone.
API Configuration: Storing endpoints or non-sensitive configuration keys.
Feature Flags: Enabling or disabling specific logic in a DAG without redeploying code.
Project Constants: Shared values used across multiple different DAGs in the same environment.
Managing Variables
You have three primary ways to interact with Variables:
The UI (User Interface)
The most common way to manage them is via Admin -> Variables. Here you can see a list of all keys, their values, and even "Bulk Import" a JSON file.
The CLI (Command Line Interface)
As we discussed, the CLI is great for automation and migration:
Export:
airflow variables export vars.jsonImport:
airflow variables import vars.jsonSet/Get:
airflow variables set my_key my_value
Environment Variables
You can also set Variables via your OS environment using the prefix AIRFLOW_VAR_. For example, setting AIRFLOW_VAR_S3_BUCKET=my-data makes a Variable named s3_bucket available in Airflow. This is often faster and more secure for CI/CD pipelines.
How to use Variables in Code
There are two ways to pull a Variable into your DAG:
Method A: The Python Library (Variable.get)
Use this inside your Python functions or at the top level of your DAG (though be careful with top-level calls as they hit the DB every time the DAG is parsed).
Method B: Jinja Templating (Recommended for Operators)
This is the most efficient way to use Variables in traditional operators because it only retrieves the value at runtime, not during DAG parsing.
Variables vs. XComs: The Key Distinction
Since you've been learning about XComs, it is important not to confuse them with Variables:
Feature
Variables
XComs
Scope
Global: Available to all DAGs and all tasks.
Local: Usually specific to a single DAG run.
Lifecycle
Persistent: Stays in the DB until you delete it.
Transient: Created during a task run.
Purpose
Configuration: Settings, paths, and flags.
Communication: Passing data between tasks.
A Note on Security (Secret Masking)
Airflow has a built-in security feature for Variables. If you create a variable with a key that contains words like "password", "secret", "key", or "auth", the Airflow UI will automatically mask the value with asterisks (****).
Pro-Tip: While Variables can be masked, they are stored in the database as plain text unless you have configured a Secrets Backend (like HashiCorp Vault or AWS Secrets Manager). For highly sensitive taxi API credentials, a Secrets Backend is safer than a standard Variable.
Storing complex configurations as Variables
Storing complex configurations in a single variable is a great way to keep your Admin -> Variables list clean. Instead of having five different variables for one project, you can store a single JSON object.
Using deserialize_json
deserialize_jsonWhen you store a value like {"api_key": "12345", "retries": 3, "timeout": 10} in a variable named taxi_config, you can pull it into your code as a native Python dictionary.
What Exporting and Importing Variables Means
In Airflow, exporting and importing is the process of moving your configuration "knowledge" out of the database and into a file (or vice versa). This is a critical skill for any Data Engineer moving code from a laptop to a production server.
Exporting Variables
Exporting takes all the key-value pairs you’ve created in the UI and saves them into a JSON or YAML file.
Why do it? To create a backup of your settings or to "template" an environment so another developer can replicate your setup.
How: Use the CLI command
airflow variables export my_vars.json.
Importing Variables
Importing takes that file and injects those keys and values back into the Airflow metadata database.
Why do it? To quickly set up a new Airflow environment (like your Docker setup) without manually typing every path and API key into the UI.
How: Use the CLI command
airflow variables import my_vars.json.
Key Differences in the Workflow
Action
Direction
Common Format
Primary Use Case
Export
DB → Local File
.json or .yaml
Backing up settings or migrating to production.
Import
Local File → DB
.json or .yaml
Setting up a new environment or updating bulk settings.
Pro-Tip for your Project
If you are using Docker, you can actually skip the manual "Import" step by placing your variables in a file and pointing Airflow to it, or by using Environment Variables. If you set an environment variable like AIRFLOW_VAR_TAXI_DATA_PATH=/data, Airflow will treat it as a variable automatically without you ever having to "import" it into the database.
Exporting and Importing: Example
To practice exporting and importing, you need to follow a specific JSON structure that Airflow recognizes.
The variables.json Structure
variables.json StructureCreate a file named variables.json. Notice how we can store simple strings, numbers, and even nested objects (which is where deserialize_json comes in).
The CLI Workflow
Once you have your file ready, open your terminal (if you are using Docker, remember to run these inside the container or via docker exec).
To Import:
This command will read the file and create (or update) these entries in your Airflow database.
To Export:
If you make changes in the UI and want to save them back to your local machine, run:
Understanding the Import "Override" logic
When you import variables, Airflow follows a specific behavior:
If the key is new: It creates the variable.
If the key already exists: It overwrites the existing value with the one from the file.
Why JSON Deserialization is a "Pro" Move
By grouping your api_endpoint, retries, and batch_size inside a single taxi_config JSON object, you gain two big advantages:
Atomicity: You update all related settings at once by importing one file.
Organization: Your Airflow UI doesn't get cluttered with 50 individual variables. Instead, you see one clear configuration object per project.
Note: When you look at
taxi_configin the Airflow UI after importing, it will look like a long string. But because you setdeserialize_json=Truein your code, Airflow handles the "translation" back into a Python dictionary for you.
Summary Checklist
[ ] Format: Ensure your JSON keys are strings and the file is valid JSON.
[ ] Environment: Use the CLI for bulk moves and the UI for quick single-value tweaks.
[ ] Security: Never export variables if they contain plain-text passwords unless you are moving them to a secure, encrypted volume.
Handling Environment Variables
In Docker Compose, you generally have two ways to do this, depending on how "permanent" you want the variables to be.
The docker-compose.yml (Hardcoded)
docker-compose.yml (Hardcoded)You can add them directly under the environment: section of your Airflow services. This is great for local development because the settings are "baked into" your infrastructure.
Key Rule: Any variable prefixed with
AIRFLOW_VAR_is automatically picked up as an Airflow Variable.Case Sensitivity: The suffix must match your code.
AIRFLOW_VAR_MY_KEYbecomes the variablemy_key(Airflow lowercases it).
The .env File (The "Cleaner" Way)
.env File (The "Cleaner" Way)Rather than cluttering your YAML file, you can use a .env file in the same directory. Docker Compose reads this automatically.
In .env:
Other Setup Types
If you move beyond local Docker, the "where" changes, but the AIRFLOW_VAR_ prefix trick stays the same:
Kubernetes (Helm): You define them in your
values.yamlfile under theenvsection.Managed Services (MWAA/Cloud Composer): You set them in the AWS or GCP Console under "Environment Variables."
Standard Linux (systemd): You export them in your
.bashrcor the service configuration file:export AIRFLOW_VAR_MY_KEY=value.
Important: The "Precedence" Rule
If you have the same variable defined in multiple places, Airflow follows a specific order of "who wins":
Environment Variables (
AIRFLOW_VAR_...) → The Winner.Secret Backends (Vault, AWS Secrets Manager).
The Metadata Database (Variables you typed into the UI or imported via CLI).
Wait, why is this important? If you set a variable in
docker-compose.yml, you cannot change it through the Airflow UI. It will appear to save in the UI, but the code will keep using the Docker value. This is a common "gotcha" for beginners!
Summary Table
Method
Best For...
UI Editable?
docker-compose.yml
Local dev constants.
No
.env file
Keeping secrets out of Git.
No
UI / CLI Import
Values you need to change frequently.
Yes
Secrets Backend
Production-grade security.
No
Last updated