Managing extensions
DuckDB’s extension system is the "secret sauce" that allows it to remain a tiny, single-file binary while still being able to talk to S3, query Postgres, or perform geospatial analysis.
The Three Tiers of Extensions
Extensions are categorized based on who maintains them and how much you should trust them.
Tier
Source
Security
Purpose
Built-in
Statically linked in the binary
✅ Maximum
Critical features like parquet and json.
Core
Official DuckDB Repository
✅ Signed by DuckDB
High-performance connectors like httpfs (S3), postgres_scanner, and iceberg.
Community
community-extensions.duckdb.org
⚠️ Use with care
Specialized tools like gsheets (Google Sheets) or prql.
Autoloading: "Magic" for Engineers
One of the best features for your ETL scripts is Autoloading. You don't always have to run INSTALL or LOAD.
If you run
SELECT * FROM 's3://my-bucket/data.parquet', DuckDB detects thes3://prefix and automatically installs and loads thehttpfsextension for you.
Note: You should note that autoloading is enabled by default in the CLI but might need to be explicitly configured when using DuckDB in a Golang or Python application for security reasons.
Install and load an extension manually:
Essential Extensions for Your Kit
As you build your skills, these are the ones you will use 90% of the time:
httpfs(S3/HTTP): Allows you to query files directly from cloud storage.postgres_scanner/mysql_scanner: Allows you to "mount" an external database and join its tables directly with local Parquet files.spatial: Adds PostGIS-like capability for geographic data.iceberg/delta: Crucial for "Lakehouse" architectures. It lets DuckDB read the metadata of these modern table formats.
Managing Extensions via CLI
Since you’re mastering the CLI commands, add these to your reference list:
httpfs extension
httpfs extensionWhile S3 is the most common use case for the httpfs extension, its name actually gives away its broader purpose: HTTP File System. It essentially treats the entire internet (anything reachable via a URL) as a local drive for DuckDB.
Here is how httpfs works across different protocols and providers.
Direct HTTP(S) Querying
You don't need a cloud provider account to use httpfs. You can query any publicly hosted file (CSV, Parquet, or JSON) directly from a URL.
The "Range Request" Magic: This is the most impressive part. If you query a 10GB Parquet file but only need 1 column,
httpfsuses HTTP Range Requests to download only the specific bytes for that column. It doesn't download the whole file.
Advanced httpfs Configuration
To make these remote queries work reliably, you often need to tune the "connection" settings. These are great to include in your "Best Practices":
s3_uploader_max_filesize: Useful if you are usingCOPYto write to S3. It controls the multipart upload chunk size.http_timeout: If you're on a slow connection (like at a cafe) and querying a large remote file, increase this to prevent the query from timing out.http_retries: Crucial for production ETL. If a network hiccup occurs, DuckDB will retry the range request instead of failing the whole pipeline.
Global Cloud Support (GCS, AWS, and Azure)
httpfs handles Amazon S3, Google Cloud Storage (GCS) and Azure Blob Storage using the same logic but different secret prefixes.
Amazon S3
S3 is the gold standard for httpfs. DuckDB doesn't just read from S3; it can also write to it, making it a powerful tool for uploading your "Silver" or "Gold" Parquet files directly to the cloud.
1. Connection Methods
There are two ways to connect DuckDB to your S3 bucket:
A. The "Legacy" Way (Environment Variables/Settings) This is what you'll see in most older tutorials.
B. The "Modern" Way (Secrets Manager) As of 2024/2025, this is the best practice. It’s cleaner and allows you to manage multiple sets of credentials (e.g., one for prod, one for dev).
2. Writing to S3 (The "Uploader")
As a Data Engineer, you won't just be querying; you'll be publishing. httpfs allows you to stream a query result directly into an S3 object:
3. S3-Compatible Storage (MinIO / Cloudflare R2)
One of the best "hacks" for your portfolio is using httpfs with MinIO (for local S3 testing) or Cloudflare R2 (for zero-egress-fee storage). You simply change the endpoint:
Google Cloud Storage (GCS)
DuckDB uses the s3:// syntax for GCS as well, but you have to point it to the Google endpoint:
Azure Blob Storage
Azure uses its own dedicated prefix (az://) and credential settings:
Cloud Protocol Summary Table
This is how you should categorize the protocols:
Protocol
Service
Extension Required
Key Settings
s3://
AWS S3
httpfs
s3_region, s3_access_key_id
s3://
Google GCS
httpfs
s3_endpoint = 'storage.googleapis.com'
s3://
MinIO / R2
httpfs
s3_endpoint, s3_url_style
az://
Azure Blob
azure (built on httpfs)
azure_storage_connection_string
https://
Public Web
httpfs
None (Zero-Config)
Security: Secrets Manager (DuckDB 0.10+)
https://duckdb.org/docs/stable/configuration/secrets_manager, https://duckdb.org/docs/stable/sql/statements/create_secret
Instead of manually SET-ing keys (which can leak into your logs), DuckDB now has a dedicated Secrets Manager within the extension system:
Once created, DuckDB automatically applies these credentials whenever you query an s3:// path.
The "Scanner" Pattern (External Databases)
While httpfs allows you to query files, the Scanner extensions (suffix _scanner) allow you to treat entire external databases as if they were local DuckDB schemas. This is a level above just reading a file.
Example: The SQLite Scanner If you have an old SQLite database from a previous project, you don't need to export it to CSV first.
The Community Repository
Since version 0.10.0, DuckDB introduced a formal Community Extension Repository. These are not hosted by DuckDB officially but are accessible through the CLI. To use these, you must specify the source.
Example: The Excel Extension While your list showed excel as true, there is a community-enhanced version for advanced formatting.
Safety Note: When using community extensions, DuckDB requires you to be explicit. You might need to run
SET custom_extension_repository = 'community-extensions.duckdb.org';depending on your environment settings.
Persistent vs. In-Memory Installations
A common point of confusion for new engineers is where these extensions actually "live" once you run INSTALL.
Location: Extensions are usually installed in a hidden folder in your home directory (e.g.,
~/.duckdb/extensions/...).Persistence: You only need to
INSTALLan extension once per machine/version. However, you (or the autoloader) mustLOADit every time you start a new session.
You can verify the version of the extension to ensure it matches your DuckDB binary:
Full-Text Search (FTS)
Your list showed fts as installed. This is a "Built-in" extension that turns DuckDB into a mini-search engine (similar to Elasticsearch but much simpler).
Example: Indexing a column for search
Pro-Tip: The "Force" Load If you are working in a restricted environment (like a CI/CD runner or a locked-down server) where DuckDB cannot reach the internet to download extensions, you must pre-install them and bundle the
.duckdb_extensionfiles located in theinstall_pathshown in the query above.
Last updated