FDW (read replicas)
Data Pipeline Pattern: The Federated Read Replica
This pattern solves a specific, common problem in the "Stage 1" data stack: How do we run dbt and analytics on production data without taking down the application?
It combines two distinct concepts:
Infrastructure: The Read Replica (The safe place where data lives).
Interface: The Foreign Data Wrapper (The bridge how you access it).
The Core Problem: The "Read-Only Trap"
When setting up analytics on Postgres, you face a dilemma.
Option A: Connect to Primary.
Risk: Your heavy analytical queries (e.g.,
SELECT * FROM orders) compete for CPU with your live application. If you spike the CPU, the app crashes.
Option B: Connect to a Read Replica.
Risk: Standard Physical Replicas are strictly Read-Only. Tools like dbt require a writable environment to create tables, views, and schemas (
dbt runfails because it cannotCREATE TABLE).
The Solution: The Federated Architecture
The solution is to decouple the Compute (where dbt runs) from the Storage (where the data lives).
The "Analytics DB" (Host): A small, separate, writable Postgres instance. This is where dbt connects. It stores your models, views, and marts.
The "Read Replica" (Remote): The read-only copy of production.
The Bridge (FDW): You use a Foreign Data Wrapper to mount the Replica's tables inside the Analytics DB.
Why this works
You get the best of both worlds:
Safety: The heavy data lifting happens on the Replica, protecting the Primary.
Flexibility: The Analytics DB is writable, so dbt can build whatever it needs.
Component Deep Dive: The Read Replica
Role: The Infrastructure (Source)
A Read Replica is a physical copy of your primary database. It receives a stream of changes (WAL files) from the primary to stay synchronized.
Key Characteristic: It is Read-Only. You cannot execute
INSERT,UPDATE, orDDL(Create/Alter) commands.The "Lag": Replicas are "Eventually Consistent." Data might be a few milliseconds or seconds behind the primary.
Purpose in this Stack: It acts as the "Heavy Lifter." When you query data, the Replica's CPU and Memory are used to scan the tables, keeping your Primary safe.
Component Deep Dive: Foreign Data Wrappers (Data Federation)
Role: The Interface (Bridge)
The Foreign Data Wrapper (FDW) is the implementation of the ISO SQL/MED standard ("Management of External Data"), released in 2003. It allows one database to read/write data in another system as if it were local tables.
While Postgres is the most famous modern user of this name, this is a universal architectural pattern known in the industry as Data Federation, Data Virtualization, or Database Linking.
The "Puppet Master" Pattern
The core idea is Abstraction. You have a "Host" database (The Analytics DB) and a "Remote" source (The Read Replica).
The Illusion: The Host creates a "Virtual Table" (Foreign Table). It looks like a table, has columns like a table, and accepts SQL.
The Translation: When you query it, the Host doesn't scan its own disk. It translates your SQL into the Remote system's language (API call, proprietary protocol, or SQL dialect).
The Network Hop: The Remote system executes the query and sends the raw results back to the Host.
Universal Landscape
Every major database implements this pattern under different names:
Technology
Feature Name
How it works
Oracle
Database Link
The grandfather of this pattern (SELECT * FROM table@remote). Powerful but tightly coupled.
SQL Server
Linked Server
Connects MSSQL to Oracle, Excel, etc. via OLEDB.
Snowflake
External Tables
Allows querying files in S3/Azure Blob without loading them ("FDW for Files").
BigQuery
Omni
Executes SQL against data in AWS S3 or Azure.
Trino / Presto
Query Engine
The "Pure" version. A database with no storage that connects to everything (Kafka, S3, Postgres) simultaneously.
Why Postgres FDW is Unique
Postgres made the interface Modular. Unlike Oracle or SQL Server, which mostly connect to other databases, the Postgres FDW community has written wrappers for everything:
Twitter FDW: Search tweets with SQL.
Stripe FDW: Query your payments API as a table.
Salesforce FDW: Join CRM data directly with your app data.
Performance Critical: Predicate Pushdown
The success of this pattern relies entirely on Predicate Pushdown. This determines whether your query is fast or a disaster.
Bad FDW (Naïve):
Query:
SELECT * FROM remote_table WHERE id = 5Action: The Host pulls ALL 1 million rows over the network, then filters for
id=5locally.Result: Network Bottleneck & Timeout.
Good FDW (Smart):
Query:
SELECT * FROM remote_table WHERE id = 5Action: The Host sends the specific SQL
WHERE id = 5to the Remote system. The Remote system filters it and sends back 1 row.Result: Fast.
Note: Postgres
postgres_fdwis very smart about pushdown. It will pushWHEREclauses,JOINs (if both tables are on the same remote server), and even some aggregates to the remote replica.
Implementation Checklist
To set this up in Postgres, you need 4 specific SQL commands on your Analytics DB (Host):
Summary Comparison
Read Replica
Foreign Data Wrapper
Category
Infrastructure (Server)
Interface (Software)
Primary Goal
Reliability (Handle Load)
Connectivity (Access Data)
Storage
Heavy (Physical Copy)
Zero (Metadata Only)
Best Use
Offloading traffic from Primary.
Bridging a writable DB to a read-only source.
When to use this pattern?
✅ Use it when: You need real-time access to fresh production data for dbt models without building a complex ETL pipeline (Airbyte/Fivetran).
❌ Avoid it when: You have massive datasets (TB scale). The network hop between the Replica and the FDW Host will eventually become too slow for complex joins.
Last updated