Relationship between Data Contracts, Data Quality, and Data Observability


The best way to distinguish them is by their primary function and timing in the pipeline.

The Core Difference: Prevention vs. Cure

Feature

Data Contract Enforcement

Data Observability

Analogy

The Bouncer / Gatekeeper

The Security Camera

Goal

Prevention. Stop bad data from ever entering the system.

Detection. Find out why, where, and when data broke after or while it is happening.

Timing

Upstream (Shift-Left). checks happen at the source (Producer) or during CI/CD.

Downstream (End-to-End). Monitors the data in flight or at rest (Warehouse/Lake).

Action

Block. If the data violates the contract, the pipeline fails or the code merge is rejected.

Alert. If data looks weird, notify the engineer (e.g., via Slack/PagerDuty) to investigate.

Data Contract Tools (The Enforcers)

These tools focus on the "handshake" between the Producer and Consumer. They are often integrated into the CI/CD process.

  • How they work: When a software engineer tries to merge code that changes a database schema, the tool checks the contract.yaml. If the change breaks the contract, the tool blocks the merge.

  • Examples: Avo, Gable, Confluent Schema Registry (for Kafka), or custom CI/CD scripts using validation libraries.

Data Observability Tools (The Monitors)

These tools provide a "Datadog-like" experience for data. They look at the overall health of the system.

  • How they work: They scan your data warehouse (Snowflake/BigQuery) and pipelines (Airflow/Spark) to detect anomalies. They often use Machine Learning to learn what "normal" looks like (e.g., "We usually get 10,000 rows on Mondays; today we got 50. Alert!").

  • Examples: Monte Carlo, Bigeye, Databand, Metaplane.


Where do tools like "Great Expectations" fit?

Great Expectations (GX) is primarily a Data Validation and Quality Framework, not a full "Data Observability Platform" in the commercial sense (like Monte Carlo), though it is often used to achieve observability.

It sits in a unique middle ground:

  1. It is the "Engine" for Contracts:

    Great Expectations is the most popular library used to write the checks inside a Data Contract. When your contract says total_amount must be > 0, Great Expectations is likely the code running that check.

    • In this context, it is an Enforcement tool.

  2. It powers Observability:

    You can run Great Expectations on your data warehouse every night and log the results. If a test fails, you get an alert.

    • In this context, it is an Observability tool.

However, it lacks the "Full Platform" features of dedicated Observability tools:

  • No Automatic Anomaly Detection: In GX, you have to write the rules manually (e.g., expect_column_values_to_not_be_null). A tool like Monte Carlo figures this out automatically using ML without you writing a single rule.

  • No Lineage: GX checks data quality, but it doesn't typically map out the visual graph of how data flows from Table A to Table B to Table C (Data Lineage), which is a staple of Observability platforms.

Summary: The Ecosystem

  • Contract Tools: Define the rules (YAML).

  • Great Expectations: Runs the rules (Python library).

  • Observability Platforms: Monitor the results and the overall system health (Dashboard/Alerting).


Last updated