"Attaching" databases (Database Integration)


In DuckDB, "Attaching" is like creating a portal to another database file. Instead of importing data, you are telling DuckDB to "look at" and "query" that external database directly.

This is a core Data Engineering skill because it allows you to use DuckDB as a central hub to join data across different formats (like SQLite, Postgres, or even MySQL) without ever leaving the CLI.

Docsarrow-up-right


🛠️ Practical Example: Joining SQLite with DuckDB

Imagine you have a legacy SQLite database containing user information, and you want to join it with a modern Parquet file containing trip data.

1. The Setup

First, ensure the scanner extension is ready:

INSTALL sqlite_scanner;
LOAD sqlite_scanner;

2. The ATTACH Command

This command "mounts" the SQLite file as a schema inside your current session.

-- Attach the SQLite file and give it a nickname 'legacy'
ATTACH 'users_backup.db' AS legacy (TYPE SQLITE);

3. Querying Across Both

Now you can treat legacy like any other schema. You can even join it with a Parquet file you have on disk:


🔍 Why this is useful

  • Zero ETL: You didn't have to write a script to convert the SQLite database to CSV or Parquet first. You queried it in its original home.

  • Unified Interface: You can attach multiple databases at once. You could have ATTACH ... AS pg (TYPE POSTGRES) and ATTACH ... AS my_sqlite (TYPE SQLITE) and join them in a single query.

  • Data Migration: It's the easiest way to move data. If you want to convert that SQLite table to an optimized Parquet file:

    SQL

Useful Commands for Inspection

Once attached, you can manage these connections with these specific commands:

  • DETACH legacy;: Safely removes the connection to the external file.

  • SELECT * FROM duckdb_databases();: Lists all attached databases and their file paths.

  • SHOW ALL TABLES;: Shows tables from your main memory and all attached databases.


DuckDB: a cross-database join example


Last updated