dbt


Docsarrow-up-right

Best Practicesarrow-up-right

How dbt Projectsarrow-up-right are organized

How should I style my project?arrow-up-right

dbt with Airflowarrow-up-right

Youtube playlist about dbtarrow-up-right

Playlist about dbt by a DataTalks.Club instructorarrow-up-right


Model naming conventionsarrow-up-right

Resources


dbt Project

dbt_project.yml

This is your project configuration file that lives in the root of your dbt project. It defines:

  • Project identity: The project name, version, and required dbt version

  • Directory structure: Where dbt should look for models, tests, seeds, snapshots, macros, etc.

  • Model configurations: Default settings like materialization strategies (table, view, incremental), schema names, and tags that apply to groups of models

  • Variables: Project-wide variables that can be used in your models

  • Project-level settings: Documentation paths, dispatch configurations, and other project-specific behaviors

Example of what you might configure:

profiles.yml

This is your connection configuration file that typically lives in your ~/.dbt/ directory (outside your project). It defines:

  • Database connections: How to connect to your data warehouse (credentials, host, port, database name)

  • Targets: Different environments like dev, staging, and production

  • Authentication details: Username, password, or authentication method for each target

This file is kept separate (and typically not committed to version control) because it contains sensitive credentials.

Example structure:

Note: The profiles.yml file is where you define multiple environments (called "targets" in dbt terminology) for your project.

Here's how it works:

Switching Between Environments

You can switch between targets in a few ways:

  1. Command line flag: dbt run --target prod

  2. Change the default: Edit the target: line in profiles.yml

  3. Environment variable: Set DBT_TARGET=prod

Common Use Cases

  • dev: Personal development environment, often with your own schema

  • staging/qa: Testing environment that mirrors production

  • prod: Production environment with real data

  • ci: Continuous integration environment for automated testing


The difference between dbt Core and dbt Fusion

To be clear: dbt Fusion is the new, high-performance execution engine for dbt, rebuilt entirely in Rust.

  • dbt Core is the classic engine (written in Python). Run commands via dbt

  • dbt Fusion is the next-generation engine (written in Rust) designed to be significantly faster and more robust. Run commands via dbtf

While the commands are 95% identical, the behavior and performance of those commands differ significantly.


dbt commands (Core and Fusion)

Running commands for specific resources in a given invocationarrow-up-right

The dbt build Command

Let's focus specifically on dbt build. This is the most important command in modern dbt workflows. Think of it as the "Smart Execute" button.

Before dbt build existed, you had to run separate commands blindly: dbt seed, then dbt run, then dbt test. If dbt run succeeded but dbt test failed 10 minutes later, you had bad data sitting in your warehouse that whole time.

dbt build encapsulates the following commands into a single, DAG-aware flow:

  1. dbt run: Executes your models (tables/views).

  2. dbt test: Runs data quality tests (unique, not_null, etc.).

  3. dbt seed: Loads CSV files from your repository into the DB.

  4. dbt snapshot: Updates your Type 2 Slowly Changing Dimension tables.

  5. dbt function (Fusion Exclusive): Compiles and runs User Defined Functions (UDFs).

How it works (The "Fail-Fast" Mechanism)

dbt build executes resources in dependency order (the DAG).

  • Step A: It runs model_A.

  • Step B: Immediately after model_A finishes, it runs tests defined on model_A.

  • Step C:

    • If tests PASS: It proceeds to build model_B (which depends on A).

    • If tests FAIL: It skips model_B.

This prevents "cascading failures" where a broken upstream model causes 50 downstream models to build with garbage data.


dbt Core vs. dbt Fusion: Command Comparison

While the syntax is similar, dbt Fusion changes how these commands run.

Feature

dbt Core (Python)

dbt Fusion (Rust)

CLI Command

dbt run, dbt build

dbtf run, dbtf build (often aliased back to dbt)

Execution Speed

Moderate. Python overhead can be slow for large project parsing.

Blazing Fast. Rust binaries parse projects in milliseconds, not seconds.

Concurrency

Single-threaded. Unsafe to run multiple commands at once in the same process.

Multi-threaded. Can safely run a dbt parse (read) while a dbt build (write) is running.

Compilation

Slower Jinja rendering.

Optimized rendering (uses a Rust port of Jinja called MiniJinja).

UDF Support

Limited/None.

First-class support for User Defined Functions via dbt function.

Note on the Alias: When you install dbt Fusion, you might still type dbt build, but under the hood, it is using the Rust binary. Explicitly, the binary is often called dbtf to disambiguate during the transition period.


More about dbt commands

These are the commands that dbt build is running for you behind the scenes.

A. dbt run

  • Core: Compiles SQL and sends it to the warehouse.

  • Fusion: Does the same but compiles the DAG significantly faster. Fusion is capable of resolving large dependency graphs (10,000+ models) with almost zero latency, whereas Core often "hangs" for a minute just figuring out what to run.

B. dbt test

  • Core: Runs SQL queries that look for failing rows.

  • Fusion: Same logic, but again, the overhead of generating the test SQL is reduced.

C. dbt parse

  • Core: Reads your project files to understand the DAG. This runs automatically at the start of every command and is notoriously slow in Python for huge projects.

  • Fusion: This is where Fusion shines. It can parse massive projects almost instantly because it's reading files with Rust.

dbt compile

While dbt run actually executes SQL against your database, dbt compile is the "behind-the-scenes" command that prepares the code for execution without actually touching your data.

Think of it as a translation step: it takes your flexible, dynamic Jinja code and turns it into pure, "boring" SQL that your database can understand.

What happens during Compilation?

When you run dbt compile, dbt performs three major tasks:

  • Jinja Resolution: It replaces all {{ ref('model_name') }} and {{ source(...) }} tags with the actual database paths (e.g., analytics.dev_schema.model_name).

  • Macro Expansion: If you have custom logic or loops (like generating a list of columns), dbt executes that logic and writes out the resulting SQL.

  • Manifest Creation: It updates the manifest.json file, which is the "master map" of your entire project’s lineage and dependencies.



Last updated