# Event-driven architecture patterns

***

{% embed url="<https://solace.com/event-driven-architecture-patterns/>" %}

***

#### 1. CQRS (Command Query Responsibility Segregation)

The Concept: Splitting the code that writes data (Command) from the code that reads data (Query).

* For the Software Engineer (SE):
  * Goal: API Performance. They split the stack so that complex user queries (e.g., "Search for products") don't slow down high-speed transactions (e.g., "Checkout").
  * Relevance: They build the "Projections" (Read Models) into fast databases like Redis or Elasticsearch to power the UI.

* For the Data Engineer (DE):
  * Goal: The Ultimate Read Model.
  * Relevance: In a modern architecture, the Data Warehouse (Snowflake/BigQuery) is essentially just a massive "Read Side" of the application's CQRS implementation.
  * Your Job: You consume the events emitted by the "Command Side" and build a Read Model optimized for Analytics (Star Schemas, OLAP Cubes) rather than for a UI.

***

#### 2. Event Sourcing

The Concept: Storing the sequence of state-changing events rather than just the current state.

* For the Software Engineer (SE):
  * Goal: Logic Integrity. It allows them to handle complex business rules (like reversing a transaction accurately) and provides an audit log for free.
  * Relevance: They worry about "Snapshotting" to ensure the app loads fast.
* For the Data Engineer (DE):
  * Goal: Infinite Granularity & Time Travel.
  * Relevance: This is a gold mine. Standard databases destroy data (Overwrites). Event Sourcing preserves every micro-interaction.
  * Your Job: You use this to answer "Point-in-Time" questions that standard DBs cannot answer.
    * *Example:* "What was the value of our inventory at exactly 2:00 PM on Black Friday last year?"
    * With a standard DB backup, you can't know. With Event Sourcing, you replay the stream to that exact timestamp.

***

#### 3. ECST (Event-Carried State Transfer)

The Concept: Putting the *entire* changed object (payload) inside the event, not just the ID.

* For the Software Engineer (SE):
  * Goal: Decoupling. Service B doesn't need to call Service A's API to get the user's email; it's right there in the event.
  * Relevance: They worry about message size limits (e.g., Kafka's 1MB limit).
* For the Data Engineer (DE):
  * Goal: Zero-ETL (or "Lite" ETL).
  * Relevance: If the events are "State Carried," you don't need to perform complex joins or lookups during ingestion. You don't need to ping the production database to "enrich" the data.
  * Your Job: You simply dump these rich events directly into the Data Lake (Bronze Layer). The data is already complete. It massively simplifies your pipelines.

***

#### 4. CDC (Change Data Capture)

The Concept: Watching the internal transaction log of a database and turning every `INSERT`, `UPDATE`, and `DELETE` into a stream event.

* For the Software Engineer (SE):
  * Goal: "The Outbox Pattern." It's a hack to reliably send events to other microservices without dealing with "Dual Write" issues.
  * Relevance: They use it to trigger side effects (e.g., "When a user is inserted into Postgres, trigger an email").
* For the Data Engineer (DE):
  * Goal: Database Replication.
  * Relevance: This is the most important pattern on this list for you. It is the modern replacement for "Batch Extraction."
  * Your Job: Instead of running a heavy `SELECT * FROM Orders` every night (which slows down the app), you run a CDC tool (like Debezium). It runs silently in the background, streaming changes to your warehouse in real-time.
  * The Nuance: You must understand Log Compaction. CDC streams can get huge. You need to know how to "compact" the stream so you only keep the latest version of a row in your long-term storage.

#### 5. Event Notification

Event Notification is the "fundamental atom" of event-driven architecture, and I omitted it from the deep-dive list.

While it is the simplest pattern, it creates very different challenges for Data Engineers compared to Software Engineers. In fact, for a Data Engineer, this pattern is often a **performance trap.**

The Concept: The "Thin Event." The event contains minimal information—usually just the ID of the entity that changed and the type of change. It does not contain the data itself.

* Payload: `{ "event": "OrderPlaced", "orderId": 789, "timestamp": "12:00" }`
* The Implication: If the consumer wants to know *what* was bought, it must turn around and query the Producer's API (a "Callback").

**For the Software Engineer (SE)**

* Goal: Freshness & Security.
* Relevance:
  * Always Up-to-Date: By sending only the ID, the consumer is forced to fetch the data *right now*. This ensures they don't process stale data (e.g., processing a payment for an order that was canceled 1 second ago).
  * Security: You don't accidentally broadcast PII (Personally Identifiable Information) or sensitive data into the event bus. You control access via the API callback.

**For the Data Engineer (DE)**

* Goal: Orchestration (Triggers).
* Relevance:
  * The Good (Triggers): This is excellent for workflow orchestration. An event like `FileLandedInS3` is an Event Notification. Your Airflow DAG or AWS Lambda listens for it and starts a job.
  * The Bad (The "N+1" Nightmare): If you are trying to build a Data Warehouse, this pattern is terrible.
  * Why? Imagine you receive 10,000 `OrderPlaced` events per second. To ingest this data, your pipeline has to make 10,000 API calls back to the Order Service to get the details.
  * Result: You will accidentally DDoS (Distributed Denial of Service) your own company's production API. The Software Engineering team will block you.

#### Summary: The Intersection

| **Pattern**        | **The App Developer Builds...**              | **The Data Engineer Consumes...**                                        |
| ------------------ | -------------------------------------------- | ------------------------------------------------------------------------ |
| CQRS               | The Command Side (the trigger).              | The events to build the **Analytical View** (the report).                |
| Event Sourcing     | The Event Store (the application brain).     | **The History** (to rebuild state at any point in time).                 |
| ECST               | The Rich Payload (for other services).       | The **Pre-Joined** **Data** (to skip enrichment steps).                  |
| CDC                | The Outbox (reliable messaging).             | **The Replica** (real-time data warehouse sync).                         |
| Event Notification | The Signal (ID-only for security/freshness). | **The Trigger** (which forces a slow API callback to fetch actual data). |

Why Event Notification row is unique: For every other pattern in this table, the Data Engineer gets the data *inside* the event. For Event Notification, you get a "job to do" instead of data. This is why it is often the most expensive pattern for data pipelines to handle at scale.

The Bottom Line:

The Software Engineer uses these patterns to protect the application state.

The Data Engineer uses these patterns to liberate the application state so it can be analyzed.

***
