Event-driven architecture patterns



1. CQRS (Command Query Responsibility Segregation)

The Concept: Splitting the code that writes data (Command) from the code that reads data (Query).

  • For the Software Engineer (SE):

    • Goal: API Performance. They split the stack so that complex user queries (e.g., "Search for products") don't slow down high-speed transactions (e.g., "Checkout").

    • Relevance: They build the "Projections" (Read Models) into fast databases like Redis or Elasticsearch to power the UI.

  • For the Data Engineer (DE):

    • Goal: The Ultimate Read Model.

    • Relevance: In a modern architecture, the Data Warehouse (Snowflake/BigQuery) is essentially just a massive "Read Side" of the application's CQRS implementation.

    • Your Job: You consume the events emitted by the "Command Side" and build a Read Model optimized for Analytics (Star Schemas, OLAP Cubes) rather than for a UI.


2. Event Sourcing

The Concept: Storing the sequence of state-changing events rather than just the current state.

  • For the Software Engineer (SE):

    • Goal: Logic Integrity. It allows them to handle complex business rules (like reversing a transaction accurately) and provides an audit log for free.

    • Relevance: They worry about "Snapshotting" to ensure the app loads fast.

  • For the Data Engineer (DE):

    • Goal: Infinite Granularity & Time Travel.

    • Relevance: This is a gold mine. Standard databases destroy data (Overwrites). Event Sourcing preserves every micro-interaction.

    • Your Job: You use this to answer "Point-in-Time" questions that standard DBs cannot answer.

      • Example: "What was the value of our inventory at exactly 2:00 PM on Black Friday last year?"

      • With a standard DB backup, you can't know. With Event Sourcing, you replay the stream to that exact timestamp.


3. ECST (Event-Carried State Transfer)

The Concept: Putting the entire changed object (payload) inside the event, not just the ID.

  • For the Software Engineer (SE):

    • Goal: Decoupling. Service B doesn't need to call Service A's API to get the user's email; it's right there in the event.

    • Relevance: They worry about message size limits (e.g., Kafka's 1MB limit).

  • For the Data Engineer (DE):

    • Goal: Zero-ETL (or "Lite" ETL).

    • Relevance: If the events are "State Carried," you don't need to perform complex joins or lookups during ingestion. You don't need to ping the production database to "enrich" the data.

    • Your Job: You simply dump these rich events directly into the Data Lake (Bronze Layer). The data is already complete. It massively simplifies your pipelines.


4. CDC (Change Data Capture)

The Concept: Watching the internal transaction log of a database and turning every INSERT, UPDATE, and DELETE into a stream event.

  • For the Software Engineer (SE):

    • Goal: "The Outbox Pattern." It's a hack to reliably send events to other microservices without dealing with "Dual Write" issues.

    • Relevance: They use it to trigger side effects (e.g., "When a user is inserted into Postgres, trigger an email").

  • For the Data Engineer (DE):

    • Goal: Database Replication.

    • Relevance: This is the most important pattern on this list for you. It is the modern replacement for "Batch Extraction."

    • Your Job: Instead of running a heavy SELECT * FROM Orders every night (which slows down the app), you run a CDC tool (like Debezium). It runs silently in the background, streaming changes to your warehouse in real-time.

    • The Nuance: You must understand Log Compaction. CDC streams can get huge. You need to know how to "compact" the stream so you only keep the latest version of a row in your long-term storage.

5. Event Notification

Event Notification is the "fundamental atom" of event-driven architecture, and I omitted it from the deep-dive list.

While it is the simplest pattern, it creates very different challenges for Data Engineers compared to Software Engineers. In fact, for a Data Engineer, this pattern is often a performance trap.

The Concept: The "Thin Event." The event contains minimal information—usually just the ID of the entity that changed and the type of change. It does not contain the data itself.

  • Payload: { "event": "OrderPlaced", "orderId": 789, "timestamp": "12:00" }

  • The Implication: If the consumer wants to know what was bought, it must turn around and query the Producer's API (a "Callback").

For the Software Engineer (SE)

  • Goal: Freshness & Security.

  • Relevance:

    • Always Up-to-Date: By sending only the ID, the consumer is forced to fetch the data right now. This ensures they don't process stale data (e.g., processing a payment for an order that was canceled 1 second ago).

    • Security: You don't accidentally broadcast PII (Personally Identifiable Information) or sensitive data into the event bus. You control access via the API callback.

For the Data Engineer (DE)

  • Goal: Orchestration (Triggers).

  • Relevance:

    • The Good (Triggers): This is excellent for workflow orchestration. An event like FileLandedInS3 is an Event Notification. Your Airflow DAG or AWS Lambda listens for it and starts a job.

    • The Bad (The "N+1" Nightmare): If you are trying to build a Data Warehouse, this pattern is terrible.

    • Why? Imagine you receive 10,000 OrderPlaced events per second. To ingest this data, your pipeline has to make 10,000 API calls back to the Order Service to get the details.

    • Result: You will accidentally DDoS (Distributed Denial of Service) your own company's production API. The Software Engineering team will block you.

Summary: The Intersection

Pattern

The App Developer Builds...

The Data Engineer Consumes...

CQRS

The Command Side (the trigger).

The events to build the Analytical View (the report).

Event Sourcing

The Event Store (the application brain).

The History (to rebuild state at any point in time).

ECST

The Rich Payload (for other services).

The Pre-Joined Data (to skip enrichment steps).

CDC

The Outbox (reliable messaging).

The Replica (real-time data warehouse sync).

Event Notification

The Signal (ID-only for security/freshness).

The Trigger (which forces a slow API callback to fetch actual data).

Why Event Notification row is unique: For every other pattern in this table, the Data Engineer gets the data inside the event. For Event Notification, you get a "job to do" instead of data. This is why it is often the most expensive pattern for data pipelines to handle at scale.

The Bottom Line:

The Software Engineer uses these patterns to protect the application state.

The Data Engineer uses these patterns to liberate the application state so it can be analyzed.


Last updated