DLT - Data Load Tool

Data ingestion tool


Docsarrow-up-right

CLI commands referencearrow-up-right

dlt fundamentalsarrow-up-right

Code examplesarrow-up-right with various scenarios

How dlt manages schemaarrow-up-right

Creating a pipelinearrow-up-right

About pipeline statearrow-up-right

Use workspace dashboard (web app) to see the state of a pipeline you've run locallyarrow-up-right

Adjusting the schema after running the pipelinearrow-up-right

Accessing loaded dataarrow-up-right

Transforming data after loading itarrow-up-right

Optimizing dlt performancearrow-up-right

From local to productionarrow-up-right, Deploy a pipeline with Airflowarrow-up-right

dlt workshopsarrow-up-right

How dlt uses Apache Arrow in its pipelinesarrow-up-right


What is dlt?

It is an open-source Python library that you install via a package manager like pip, poetry, uv, etc. Unlike platforms (like Airbyte or Fivetran) that run as separate services, dlt runs inside your Python code. This means you can run it in a Jupyter notebook, a Lambda function, or as a task within an Airflow DAG.

The "Killer Features" for Data Engineers

If you are building pipelines, dlt solves three specific headaches that usually require writing a lot of custom code:

1. Automated Schema Inference & Evolution This is arguably its strongest feature. If you pull a JSON object from an API that has a new field user_rank that wasn't there yesterday:

  • Traditional way: The pipeline fails because the target table doesn't have that column.

  • With dlt: It detects the new field, alters the table in your Data Warehouse (Snowflake, BigQuery, DuckDB, etc.) to add the column, and then loads the data.

2. Automatic Normalization (Unnesting) It automatically handles nested JSON. If an API returns a list of dictionaries nested inside a key, dlt will automatically break that out into a child table and generate the foreign keys to link it back to the parent record.

3. Declarative Incremental Loading Instead of writing complex SQL logic to check MAX(updated_at), you can flag a field in your Python resource as the cursor, and dlt manages the state for you.

Where it fits in a Stack

Because it is just a library, it fits into the "Extract" and "Load" steps of an ELT pipeline.

  • You write: A Python script that yields data (dictionaries or lists).

  • dlt handles: Buffering, normalizing, creating tables, and inserting data into the destination.

  • Orchestration: You still use Airflow, Dagster, or Prefect to schedule the script.

Quick Example Mental Model

You define a "resource" (source of data) using a decorator, and then "run" a pipeline.

Why engineers are liking it

It bridges the gap between "I'll just write a quick Python script" (which is fast but fragile) and "I need to set up a heavy Enterprise ETL tool" (which is robust but expensive/complex). It allows you to write custom Python logic for extraction but gives you the robustness of an enterprise tool for the loading part.


Last updated