dbt

Docs

Best Practices

How dbt Projects are organized

How should I style my project?

dbt with Airflow

Youtube playlist about dbt

Playlist about dbt by a DataTalks.Club instructor

Model naming conventions

Resources

dbt Project

dbt_project.yml

This is your project configuration file that lives in the root of your dbt project. It defines:

Project identity: The project name, version, and required dbt version
Directory structure: Where dbt should look for models, tests, seeds, snapshots, macros, etc.
Model configurations: Default settings like materialization strategies (table, view, incremental), schema names, and tags that apply to groups of models
Variables: Project-wide variables that can be used in your models
Project-level settings: Documentation paths, dispatch configurations, and other project-specific behaviors

Example of what you might configure:

name: 'my_analytics'
models:
  my_analytics:
    staging:
      +materialized: view
    marts:
      +materialized: table

profiles.yml

This is your connection configuration file that typically lives in your ~/.dbt/ directory (outside your project). It defines:

Database connections: How to connect to your data warehouse (credentials, host, port, database name)
Targets: Different environments like dev, staging, and production
Authentication details: Username, password, or authentication method for each target

This file is kept separate (and typically not committed to version control) because it contains sensitive credentials.

Example structure:

my_analytics:
  target: dev
  outputs:
    dev:
      type: snowflake
      account: abc123
      user: my_username
      password: my_password
      ...

Note: The profiles.yml file is where you define multiple environments (called "targets" in dbt terminology) for your project.

Here's how it works:

my_project:
  target: dev  # default target to use
  outputs:
    dev:
      type: snowflake
      account: abc123
      database: analytics_dev
      schema: dbt_john
      ...
    
    staging:
      type: snowflake
      account: abc123
      database: analytics_staging
      schema: analytics
      ...
    
    prod:
      type: snowflake
      account: abc123
      database: analytics_prod
      schema: analytics
      ...

Switching Between Environments

You can switch between targets in a few ways:

Command line flag: dbt run --target prod
Change the default: Edit the target: line in profiles.yml
Environment variable: Set DBT_TARGET=prod

Common Use Cases

dev: Personal development environment, often with your own schema
staging/qa: Testing environment that mirrors production
prod: Production environment with real data
ci: Continuous integration environment for automated testing

The difference between dbt Core and dbt Fusion

To be clear: dbt Fusion is the new, high-performance execution engine for dbt, rebuilt entirely in Rust.

dbt Core is the classic engine (written in Python). Run commands via dbt
dbt Fusion is the next-generation engine (written in Rust) designed to be significantly faster and more robust. Run commands via dbtf

While the commands are 95% identical, the behavior and performance of those commands differ significantly.

dbt commands (Core and Fusion)

Running commands for specific resources in a given invocation

The `dbt build` Command

Let's focus specifically on dbt build. This is the most important command in modern dbt workflows. Think of it as the "Smart Execute" button.

Before dbt build existed, you had to run separate commands blindly: dbt seed, then dbt run, then dbt test. If dbt run succeeded but dbt test failed 10 minutes later, you had bad data sitting in your warehouse that whole time.

dbt build encapsulates the following commands into a single, DAG-aware flow:

dbt run: Executes your models (tables/views).
dbt test: Runs data quality tests (unique, not_null, etc.).
dbt seed: Loads CSV files from your repository into the DB.
dbt snapshot: Updates your Type 2 Slowly Changing Dimension tables.
dbt function (Fusion Exclusive): Compiles and runs User Defined Functions (UDFs).

How it works (The "Fail-Fast" Mechanism)

dbt build executes resources in dependency order (the DAG).

Step A: It runs model_A.
Step B: Immediately after model_A finishes, it runs tests defined on model_A.
Step C:
- If tests PASS: It proceeds to build model_B (which depends on A).
- If tests FAIL: It skips model_B.

This prevents "cascading failures" where a broken upstream model causes 50 downstream models to build with garbage data.

dbt Core vs. dbt Fusion: Command Comparison

While the syntax is similar, dbt Fusion changes how these commands run.

Feature

dbt Core (Python)

dbt Fusion (Rust)

CLI Command

dbt run, dbt build

dbtf run, dbtf build (often aliased back to dbt)

Execution Speed

Moderate. Python overhead can be slow for large project parsing.

Blazing Fast. Rust binaries parse projects in milliseconds, not seconds.

Concurrency

Single-threaded. Unsafe to run multiple commands at once in the same process.

Multi-threaded. Can safely run a dbt parse (read) while a dbt build (write) is running.

Compilation

Slower Jinja rendering.

Optimized rendering (uses a Rust port of Jinja called MiniJinja).

UDF Support

Limited/None.

First-class support for User Defined Functions via dbt function.

Note on the Alias: When you install dbt Fusion, you might still type dbt build, but under the hood, it is using the Rust binary. Explicitly, the binary is often called dbtf to disambiguate during the transition period.

More about dbt commands

These are the commands that dbt build is running for you behind the scenes.

A. dbt run

Core: Compiles SQL and sends it to the warehouse.
Fusion: Does the same but compiles the DAG significantly faster. Fusion is capable of resolving large dependency graphs (10,000+ models) with almost zero latency, whereas Core often "hangs" for a minute just figuring out what to run.

B. dbt test

Core: Runs SQL queries that look for failing rows.
Fusion: Same logic, but again, the overhead of generating the test SQL is reduced.

C. dbt parse

Core: Reads your project files to understand the DAG. This runs automatically at the start of every command and is notoriously slow in Python for huge projects.
Fusion: This is where Fusion shines. It can parse massive projects almost instantly because it's reading files with Rust.

dbt compile

While dbt run actually executes SQL against your database, dbt compile is the "behind-the-scenes" command that prepares the code for execution without actually touching your data.

Think of it as a translation step: it takes your flexible, dynamic Jinja code and turns it into pure, "boring" SQL that your database can understand.

What happens during Compilation?

When you run dbt compile, dbt performs three major tasks:

Jinja Resolution: It replaces all {{ ref('model_name') }} and {{ source(...) }} tags with the actual database paths (e.g., analytics.dev_schema.model_name).
Macro Expansion: If you have custom logic or loops (like generating a list of columns), dbt executes that logic and writes out the resulting SQL.
Manifest Creation: It updates the manifest.json file, which is the "master map" of your entire project’s lineage and dependencies.

PreviousData Transformation NextModel types (materializations) in dbt

Last updated 11 days ago

hashtagdbt Project

hashtagThe difference between dbt Core and dbt Fusion

hashtagdbt commands (Core and Fusion)

hashtagThe dbt build Command

hashtagdbt Core vs. dbt Fusion: Command Comparison

hashtagMore about dbt commands