# Using Variables

***

In Airflow, **Variables** are a generic way to **store and retrieve arbitrary content or settings as key-value pairs** within the Airflow metadata database. They are essentially global constants that you can manage through the UI, CLI, or code.

***

### What are Variables used for?

Variables are typically used for configuration data that is **runtime-dependent** but doesn't change frequently. Common use cases include:

* **File Paths:** Storing the base path for your data lake or landing zone.
* **API Configuration:** Storing endpoints or non-sensitive configuration keys.
* **Feature Flags:** Enabling or disabling specific logic in a DAG without redeploying code.
* **Project Constants:** Shared values used across multiple different DAGs in the same environment.

***

### Managing Variables

You have **three primary ways** to interact with Variables:

#### The UI (User Interface)

The most common way to manage them is via **Admin -> Variables**. Here you can see a list of all keys, their values, and even "Bulk Import" a JSON file.

#### The CLI (Command Line Interface)

As we discussed, the CLI is great for automation and migration:

* **Export:** `airflow variables export vars.json`
* **Import:** `airflow variables import vars.json`
* **Set/Get:** `airflow variables set my_key my_value`

#### Environment Variables

You can also set Variables via your OS environment using the prefix `AIRFLOW_VAR_`. For example, setting `AIRFLOW_VAR_S3_BUCKET=my-data` makes a Variable named `s3_bucket` available in Airflow. This is often faster and more secure for CI/CD pipelines.

***

### How to use Variables in Code

There are two ways to pull a Variable into your DAG:

#### Method A: The Python Library (Variable.get)

Use this inside your Python functions or at the top level of your DAG (though be careful with top-level calls as they hit the DB every time the DAG is parsed).

```python
from airflow.models import Variable

# Retrieve a variable
data_path = Variable.get("taxi_data_path")

# Retrieve and automatically parse JSON
config = Variable.get("api_config", deserialize_json=True)
```

#### Method B: Jinja Templating (Recommended for Operators)

This is the most efficient way to use Variables in traditional operators because it only retrieves the value at **runtime**, not during DAG parsing.

```python
fetch_task = BashOperator(
    task_id="fetch_data",
    bash_command="echo 'Fetching from {{ var.value.taxi_data_path }}'"
)
```

***

### Variables vs. XComs: The Key Distinction

Since you've been learning about [XComs](https://www.google.com/search?q=%23), it is important not to confuse them with Variables:

| **Feature**   | **Variables**                                        | **XComs**                                        |
| ------------- | ---------------------------------------------------- | ------------------------------------------------ |
| **Scope**     | **Global:** Available to all DAGs and all tasks.     | **Local:** Usually specific to a single DAG run. |
| **Lifecycle** | **Persistent:** Stays in the DB until you delete it. | **Transient:** Created during a task run.        |
| **Purpose**   | **Configuration:** Settings, paths, and flags.       | **Communication:** Passing data between tasks.   |

***

### A Note on Security (Secret Masking)

Airflow has a **built-in security feature** for Variables. If you create a variable with a key that contains words like "password", "secret", "key", or "auth", the Airflow UI will automatically mask the value with asterisks (`****`).

> **Pro-Tip:** While Variables can be masked, they are stored in the database as plain text unless you have configured a Secrets Backend (like HashiCorp Vault or AWS Secrets Manager). For highly sensitive taxi API credentials, a Secrets Backend is safer than a standard Variable.

***

### Storing complex configurations as Variables

Storing complex configurations in a single variable is a great way to keep your **Admin -> Variables** list clean. Instead of having five different variables for one project, you can store a single JSON object.

#### Using `deserialize_json`

When you store a value like `{"api_key": "12345", "retries": 3, "timeout": 10}` in a variable named `taxi_config`, you can pull it into your code as a native Python dictionary.

```python
from airflow.models import Variable

# Without deserialize_json, this would just be a string
# With it, 'config' becomes a dictionary
config = Variable.get("taxi_config", deserialize_json=True)

api_key = config["api_key"]
print(f"Using API Key: {api_key}")
```

***

### What Exporting and Importing Variables Means

In Airflow, exporting and importing is the process of moving your configuration "knowledge" out of the database and into a file (or vice versa). This is a critical skill for any Data Engineer moving code from a laptop to a production server.

#### Exporting Variables

Exporting takes all the key-value pairs you’ve created in the UI and saves them into a **JSON** or **YAML** file.

* **Why do it?** To create a backup of your settings or to "template" an environment so another developer can replicate your setup.
* **How:** Use the CLI command `airflow variables export my_vars.json`.

#### Importing Variables

Importing takes that file and injects those keys and values back into the Airflow metadata database.

* **Why do it?** To quickly set up a new Airflow environment (like your Docker setup) without manually typing every path and API key into the UI.
* **How:** Use the CLI command `airflow variables import my_vars.json`.

#### Key Differences in the Workflow

| **Action** | **Direction**                 | **Common Format**  | **Primary Use Case**                                    |
| ---------- | ----------------------------- | ------------------ | ------------------------------------------------------- |
| **Export** | DB $$\rightarrow$$ Local File | `.json` or `.yaml` | Backing up settings or migrating to production.         |
| **Import** | Local File $$\rightarrow$$ DB | `.json` or `.yaml` | Setting up a new environment or updating bulk settings. |

***

#### Pro-Tip for your Project

If you are using **Docker**, you can actually skip the manual "Import" step by placing your variables in a file and pointing Airflow to it, or by using **Environment Variables**. If you set an environment variable like `AIRFLOW_VAR_TAXI_DATA_PATH=/data`, Airflow will treat it as a variable automatically without you ever having to "import" it into the database.

***

### Exporting and Importing: Example

To practice exporting and importing, you need to follow a specific JSON structure that Airflow recognizes.

#### The `variables.json` Structure

Create a file named `variables.json`. Notice how we can store simple strings, numbers, and even nested objects (which is where `deserialize_json` comes in).

```json
{
    "taxi_data_path": "/opt/airflow/data",
    "is_production": false,
    "taxi_config": {
        "api_endpoint": "https://api.taxi-data.com/v1",
        "retries": 5,
        "batch_size": 1000
    }
}
```

***

#### The CLI Workflow

Once you have your file ready, open your terminal (if you are using Docker, remember to run these inside the container or via `docker exec`).

**To Import:**

This command will read the file and create (or update) these entries in your Airflow database.

```bash
airflow variables import variables.json
```

**To Export:**

If you make changes in the UI and want to save them back to your local machine, run:

```bash
airflow variables export my_backup_vars.json
```

***

#### Understanding the Import "Override" logic

When you import variables, Airflow follows a specific behavior:

* **If the key is new:** It creates the variable.
* **If the key already exists:** It overwrites the existing value with the one from the file.

***

#### Why JSON Deserialization is a "Pro" Move

By grouping your `api_endpoint`, `retries`, and `batch_size` inside a single `taxi_config` JSON object, you gain two big advantages:

1. **Atomicity:** You update all related settings at once by importing one file.
2. **Organization:** Your Airflow UI doesn't get cluttered with 50 individual variables. Instead, you see one clear configuration object per project.

> **Note:** When you look at `taxi_config` in the Airflow UI after importing, it will look like a long string. But because you set `deserialize_json=True` in your code, Airflow handles the "translation" back into a Python dictionary for you.

***

#### Summary Checklist

* **\[ ] Format:** Ensure your JSON keys are strings and the file is valid JSON.
* **\[ ] Environment:** Use the CLI for bulk moves and the UI for quick single-value tweaks.
* **\[ ] Security:** Never export variables if they contain plain-text passwords unless you are moving them to a secure, encrypted volume.

***

### Handling Environment Variables

In Docker Compose, you generally have two ways to do this, depending on how "permanent" you want the variables to be.

#### The `docker-compose.yml` (Hardcoded)

You can add them directly under the `environment:` section of your Airflow services. This is great for local development because the settings are "baked into" your infrastructure.

```yaml
services:
  airflow-common:
    ...
    environment:
      - AIRFLOW_VAR_TAXI_DATA_PATH=/opt/airflow/data
      - AIRFLOW_VAR_TAXI_CONFIG={"api_key": "local_dev_123"}
```

* **Key Rule:** Any variable prefixed with `AIRFLOW_VAR_` is automatically picked up as an Airflow Variable.
* **Case Sensitivity:** The suffix must match your code. `AIRFLOW_VAR_MY_KEY` becomes the variable `my_key` (Airflow lowercases it).

***

#### The `.env` File (The "Cleaner" Way)

Rather than cluttering your YAML file, you can use a `.env` file in the same directory. Docker Compose reads this automatically.

In `.env`:

```bash
AIRFLOW_VAR_TAXI_DATA_PATH=/opt/airflow/data
AIRFLOW_VAR_TAXI_CONFIG={"api_key": "secret_from_env_file"}
```

***

#### Other Setup Types

If you move beyond local Docker, the "where" changes, but the `AIRFLOW_VAR_` prefix trick stays the same:

* **Kubernetes (Helm):** You define them in your `values.yaml` file under the `env` section.
* **Managed Services (MWAA/Cloud Composer):** You set them in the AWS or GCP Console under "Environment Variables."
* **Standard Linux (systemd):** You export them in your `.bashrc` or the service configuration file: `export AIRFLOW_VAR_MY_KEY=value`.

***

#### Important: The "Precedence" Rule

If you have the same variable defined in multiple places, Airflow follows a specific order of "who wins":

1. **Environment Variables (`AIRFLOW_VAR_...`)** $$\rightarrow$$ **The Winner.**
2. **Secret Backends (Vault, AWS Secrets Manager).**
3. **The Metadata Database (Variables you typed into the UI or imported via CLI).**

> **Wait, why is this important?** If you set a variable in `docker-compose.yml`, you cannot change it through the Airflow UI. It will appear to save in the UI, but the code will keep using the Docker value. This is a common "gotcha" for beginners!

***

#### Summary Table

| **Method**               | **Best For...**                       | **UI Editable?** |
| ------------------------ | ------------------------------------- | ---------------- |
| **`docker-compose.yml`** | Local dev constants.                  | No               |
| **`.env` file**          | Keeping secrets out of Git.           | No               |
| **UI / CLI Import**      | Values you need to change frequently. | Yes              |
| **Secrets Backend**      | Production-grade security.            | No               |

***
