# dbt

***

[Docs](https://docs.getdbt.com/)

[Best Practices](https://docs.getdbt.com/best-practices)

How [dbt Projects](https://docs.getdbt.com/docs/build/projects) are organized

[How should I style my project?](https://docs.getdbt.com/best-practices/how-we-style/0-how-we-style-our-dbt-projects)

[dbt with Airflow](https://docs.getdbt.com/guides/airflow-and-dbt-cloud?step=1)

[Youtube playlist about dbt](https://www.youtube.com/watch?v=5rNquRnNb4E\&list=PLy4OcwImJzBLJzLYxpxaPUmCWp8j1esvT)

[Playlist about dbt by a DataTalks.Club instructor](https://www.youtube.com/playlist?list=PL12GPe_RDEGRfNWxAMj6ThiU0zO4OiJWT)

* [Building a Kimball dimensional model with dbt](https://docs.getdbt.com/blog/kimball-dimensional-model)
* [Data Vault 2.0 with dbt ](https://docs.getdbt.com/blog/data-vault-with-dbt-cloud)
* [Medallion Architecture with dbt](https://tsaiprabhanj.medium.com/medallion-architecture-with-dbt-a40050743be3)
* [Normalized vs Denormalized](https://medium.com/analytics-vidhya/database-normalization-vs-denormalization-a42d211dd891)

***

[Model naming conventions](https://docs.getdbt.com/blog/stakeholder-friendly-model-names)

**Resources**

* [List of dbt commands](https://docs.getdbt.com/category/list-of-commands) (docs), [More](https://popsql.com/learn-dbt/dbt-docs)
* [dbt node selection syntax](https://docs.getdbt.com/reference/node-selection/syntax) (docs)
* [Introduction to sql](https://learn.getdbt.com/courses/introduction-to-sql) (dbt course)
* [Anatomy of a SQL query](https://medium.com/@lomso.dzingwa/the-anatomy-of-a-sql-query-189dd0664851) (article)
* [Advanced SQL ](https://medium.com/@datainsights17/subqueries-and-ctes-in-sql-aa1ff4b17686)(article)
* [Version Control basics](https://docs.getdbt.com/docs/cloud/git/version-control-basics) (docs)

***

### dbt Project

**dbt\_project.yml**

This is your **project configuration file** that lives in the root of your dbt project. It defines:

* **Project identity**: The project name, version, and required dbt version
* **Directory structure**: Where dbt should look for models, tests, seeds, snapshots, macros, etc.
* **Model configurations**: Default settings like materialization strategies (table, view, incremental), schema names, and tags that apply to groups of models
* **Variables**: Project-wide variables that can be used in your models
* **Project-level settings**: Documentation paths, dispatch configurations, and other project-specific behaviors

Example of what you might configure:

```yaml
name: 'my_analytics'
models:
  my_analytics:
    staging:
      +materialized: view
    marts:
      +materialized: table
```

**profiles.yml**

This is your **connection configuration file** that typically lives in your `~/.dbt/` directory (outside your project). It defines:

* **Database connections**: How to connect to your data warehouse (credentials, host, port, database name)
* **Targets**: Different environments like dev, staging, and production
* **Authentication details**: Username, password, or authentication method for each target

This file is kept separate (and typically not committed to version control) because it contains sensitive credentials.

**Example structure:**

```yaml
my_analytics:
  target: dev
  outputs:
    dev:
      type: snowflake
      account: abc123
      user: my_username
      password: my_password
      ...
```

**Note:** The `profiles.yml` file is where you define multiple environments (called "targets" in dbt terminology) for your project.

Here's how it works:

```yaml
my_project:
  target: dev  # default target to use
  outputs:
    dev:
      type: snowflake
      account: abc123
      database: analytics_dev
      schema: dbt_john
      ...
    
    staging:
      type: snowflake
      account: abc123
      database: analytics_staging
      schema: analytics
      ...
    
    prod:
      type: snowflake
      account: abc123
      database: analytics_prod
      schema: analytics
      ...
```

**Switching Between Environments**

You can switch between targets in a few ways:

1. **Command line flag**: `dbt run --target prod`
2. **Change the default**: Edit the `target:` line in profiles.yml
3. **Environment variable**: Set `DBT_TARGET=prod`

**Common Use Cases**

* **dev**: Personal development environment, often with your own schema
* **staging/qa**: Testing environment that mirrors production
* **prod**: Production environment with real data
* **ci**: Continuous integration environment for automated testing

***

### The difference between dbt Core and dbt Fusion

To be clear: dbt Fusion is the new, high-performance execution engine for dbt, rebuilt entirely in Rust.

* **dbt Core is the classic engine (written in Python). Run commands via `dbt`**
* **dbt Fusion is the next-generation engine (written in Rust) designed to be significantly faster and more robust. Run commands via `dbtf`**

While the commands are 95% identical, the *behavior* and *performance* of those commands differ significantly.

***

### dbt commands (Core and Fusion)

[Running commands for specific resources in a given invocation](https://docs.getdbt.com/reference/node-selection/syntax)

#### The `dbt build` Command

Let's focus specifically on `dbt build`. This is the most important command in modern dbt workflows. Think of it as the "Smart Execute" button.

Before `dbt build` existed, you had to run separate commands blindly: `dbt seed`, then `dbt run`, then `dbt test`. If `dbt run` succeeded but `dbt test` failed 10 minutes later, you had bad data sitting in your warehouse that whole time.

`dbt build` **encapsulates the following commands** into a single, DAG-aware flow:

1. `dbt run`: Executes your models (tables/views).
2. `dbt test`: Runs data quality tests (unique, not\_null, etc.).
3. `dbt seed`: Loads CSV files from your repository into the DB.
4. `dbt snapshot`: Updates your Type 2 Slowly Changing Dimension tables.
5. `dbt function` *(Fusion Exclusive)*: Compiles and runs User Defined Functions (UDFs).

**How it works (The "Fail-Fast" Mechanism)**

`dbt build` executes resources in **dependency order** (the DAG).

* Step A: It runs `model_A`.
* Step B: Immediately after `model_A` finishes, it runs tests defined on `model_A`.
* Step C:
  * If tests PASS: It proceeds to build `model_B` (which depends on A).
  * If tests FAIL: It skips `model_B`.

This **prevents "cascading failures"** where a broken upstream model causes 50 downstream models to build with garbage data.

***

#### dbt Core vs. dbt Fusion: Command Comparison

While the syntax is similar, dbt Fusion changes *how* these commands run.

| **Feature**     | **dbt Core (Python)**                                                         | **dbt Fusion (Rust)**                                                                       |
| --------------- | ----------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------- |
| CLI Command     | `dbt run`, `dbt build`                                                        | `dbtf run`, `dbtf build` (often aliased back to `dbt`)                                      |
| Execution Speed | Moderate. Python overhead can be slow for large project parsing.              | Blazing Fast. Rust binaries parse projects in milliseconds, not seconds.                    |
| Concurrency     | Single-threaded. Unsafe to run multiple commands at once in the same process. | Multi-threaded. Can safely run a `dbt parse` (read) while a `dbt build` (write) is running. |
| Compilation     | Slower Jinja rendering.                                                       | Optimized rendering (uses a Rust port of Jinja called MiniJinja).                           |
| UDF Support     | Limited/None.                                                                 | First-class support for User Defined Functions via `dbt function`.                          |

Note on the Alias: When you install dbt Fusion, you might still type `dbt build`, but under the hood, it is using the Rust binary. Explicitly, the binary is often called `dbtf` to disambiguate during the transition period.

***

#### More about dbt commands

These are the commands that `dbt build` is running for you behind the scenes.

**A. `dbt run`**

* Core: Compiles SQL and sends it to the warehouse.
* Fusion: Does the same but compiles the DAG significantly faster. Fusion is capable of resolving large dependency graphs (10,000+ models) with almost zero latency, whereas Core often "hangs" for a minute just figuring out what to run.

**B. `dbt test`**

* Core: Runs SQL queries that look for failing rows.
* Fusion: Same logic, but again, the overhead of generating the test SQL is reduced.

**C. `dbt parse`**

* Core: Reads your project files to understand the DAG. This runs automatically at the start of every command and is notoriously slow in Python for huge projects.
* Fusion: This is where Fusion shines. It can parse massive projects almost instantly because it's reading files with Rust.

**`dbt compile`**

While `dbt run` actually executes SQL against your database, `dbt compile` is the "behind-the-scenes" command that prepares the code for execution without actually touching your data.

Think of it as a translation step: it takes your flexible, dynamic Jinja code and turns it into pure, "boring" SQL that your database can understand.

**What happens during Compilation?**

When you run `dbt compile`, dbt performs three major tasks:

* **Jinja Resolution**: It replaces all `{{ ref('model_name') }}` and `{{ source(...) }}` tags with the actual database paths (e.g., `analytics.dev_schema.model_name`).
* **Macro Expansion**: If you have custom logic or loops (like generating a list of columns), dbt executes that logic and writes out the resulting SQL.
* **Manifest Creation**: It updates the `manifest.json` file, which is the "master map" of your entire project’s lineage and dependencies.

***

***
