dbt
How dbt Projects are organized
How should I style my project?
Playlist about dbt by a DataTalks.Club instructor
Resources
List of dbt commands (docs), More
dbt node selection syntax (docs)
Introduction to sql (dbt course)
Anatomy of a SQL query (article)
Advanced SQL (article)
Version Control basics (docs)
dbt Project
dbt_project.yml
This is your project configuration file that lives in the root of your dbt project. It defines:
Project identity: The project name, version, and required dbt version
Directory structure: Where dbt should look for models, tests, seeds, snapshots, macros, etc.
Model configurations: Default settings like materialization strategies (table, view, incremental), schema names, and tags that apply to groups of models
Variables: Project-wide variables that can be used in your models
Project-level settings: Documentation paths, dispatch configurations, and other project-specific behaviors
Example of what you might configure:
profiles.yml
This is your connection configuration file that typically lives in your ~/.dbt/ directory (outside your project). It defines:
Database connections: How to connect to your data warehouse (credentials, host, port, database name)
Targets: Different environments like dev, staging, and production
Authentication details: Username, password, or authentication method for each target
This file is kept separate (and typically not committed to version control) because it contains sensitive credentials.
Example structure:
Note: The profiles.yml file is where you define multiple environments (called "targets" in dbt terminology) for your project.
Here's how it works:
Switching Between Environments
You can switch between targets in a few ways:
Command line flag:
dbt run --target prodChange the default: Edit the
target:line in profiles.ymlEnvironment variable: Set
DBT_TARGET=prod
Common Use Cases
dev: Personal development environment, often with your own schema
staging/qa: Testing environment that mirrors production
prod: Production environment with real data
ci: Continuous integration environment for automated testing
The difference between dbt Core and dbt Fusion
To be clear: dbt Fusion is the new, high-performance execution engine for dbt, rebuilt entirely in Rust.
dbt Core is the classic engine (written in Python). Run commands via
dbtdbt Fusion is the next-generation engine (written in Rust) designed to be significantly faster and more robust. Run commands via
dbtf
While the commands are 95% identical, the behavior and performance of those commands differ significantly.
dbt commands (Core and Fusion)
Running commands for specific resources in a given invocation
The dbt build Command
dbt build CommandLet's focus specifically on dbt build. This is the most important command in modern dbt workflows. Think of it as the "Smart Execute" button.
Before dbt build existed, you had to run separate commands blindly: dbt seed, then dbt run, then dbt test. If dbt run succeeded but dbt test failed 10 minutes later, you had bad data sitting in your warehouse that whole time.
dbt build encapsulates the following commands into a single, DAG-aware flow:
dbt run: Executes your models (tables/views).dbt test: Runs data quality tests (unique, not_null, etc.).dbt seed: Loads CSV files from your repository into the DB.dbt snapshot: Updates your Type 2 Slowly Changing Dimension tables.dbt function(Fusion Exclusive): Compiles and runs User Defined Functions (UDFs).
How it works (The "Fail-Fast" Mechanism)
dbt build executes resources in dependency order (the DAG).
Step A: It runs
model_A.Step B: Immediately after
model_Afinishes, it runs tests defined onmodel_A.Step C:
If tests PASS: It proceeds to build
model_B(which depends on A).If tests FAIL: It skips
model_B.
This prevents "cascading failures" where a broken upstream model causes 50 downstream models to build with garbage data.
dbt Core vs. dbt Fusion: Command Comparison
While the syntax is similar, dbt Fusion changes how these commands run.
Feature
dbt Core (Python)
dbt Fusion (Rust)
CLI Command
dbt run, dbt build
dbtf run, dbtf build (often aliased back to dbt)
Execution Speed
Moderate. Python overhead can be slow for large project parsing.
Blazing Fast. Rust binaries parse projects in milliseconds, not seconds.
Concurrency
Single-threaded. Unsafe to run multiple commands at once in the same process.
Multi-threaded. Can safely run a dbt parse (read) while a dbt build (write) is running.
Compilation
Slower Jinja rendering.
Optimized rendering (uses a Rust port of Jinja called MiniJinja).
UDF Support
Limited/None.
First-class support for User Defined Functions via dbt function.
Note on the Alias: When you install dbt Fusion, you might still type dbt build, but under the hood, it is using the Rust binary. Explicitly, the binary is often called dbtf to disambiguate during the transition period.
More about dbt commands
These are the commands that dbt build is running for you behind the scenes.
A. dbt run
Core: Compiles SQL and sends it to the warehouse.
Fusion: Does the same but compiles the DAG significantly faster. Fusion is capable of resolving large dependency graphs (10,000+ models) with almost zero latency, whereas Core often "hangs" for a minute just figuring out what to run.
B. dbt test
Core: Runs SQL queries that look for failing rows.
Fusion: Same logic, but again, the overhead of generating the test SQL is reduced.
C. dbt parse
Core: Reads your project files to understand the DAG. This runs automatically at the start of every command and is notoriously slow in Python for huge projects.
Fusion: This is where Fusion shines. It can parse massive projects almost instantly because it's reading files with Rust.
dbt compile
While dbt run actually executes SQL against your database, dbt compile is the "behind-the-scenes" command that prepares the code for execution without actually touching your data.
Think of it as a translation step: it takes your flexible, dynamic Jinja code and turns it into pure, "boring" SQL that your database can understand.
What happens during Compilation?
When you run dbt compile, dbt performs three major tasks:
Jinja Resolution: It replaces all
{{ ref('model_name') }}and{{ source(...) }}tags with the actual database paths (e.g.,analytics.dev_schema.model_name).Macro Expansion: If you have custom logic or loops (like generating a list of columns), dbt executes that logic and writes out the resulting SQL.
Manifest Creation: It updates the
manifest.jsonfile, which is the "master map" of your entire project’s lineage and dependencies.
Last updated