Data Strategy Concepts Overview


Data Concepts interactive network graph

This graph shows hierarchy and relationships between some of the data concepts.


Learn about data strategy from Fivetranarrow-up-right


Data Governance

Data Owner, Data Steward, Data Custodianarrow-up-right

Think of Data Governance as the "Constitution" or "City Planning Code" of your data ecosystem.

If Data Engineering is the construction (building pipelines, warehouses, and tables), Data Governance is the law (zoning regulations, safety codes, and blueprints) that dictates how those structures should be built and who is allowed to enter them.

It is the system of decision rights and accountabilities for information-related processes, executed according to agreed-upon models which describe who can take what actions with what information, and when, under what circumstances, using what methods.

The Core Purpose: Why do we need it?

Without governance, a data warehouse becomes a "Data Swamp"—a chaotic dump of unverified files where no one knows if the numbers are accurate or who has access to sensitive customer info.

Governance serves three main goals:

  • Trust: Ensuring the CFO trusts the numbers on the dashboard.

  • Compliance: Ensuring you don't get sued because you accidentally leaked European user data (violating GDPR).

  • Security: Ensuring only the right people have access to sensitive PII (Personally Identifiable Information).

The 3 Pillars of Data Governance

Governance is rarely just a software tool; it is a combination of People, Process, and Technology.

A. People (The Roles)

  • Data Owner: The senior business leader (e.g., VP of Sales) who is ultimately responsible for the data's quality and security.

  • Data Steward: The domain expert (e.g., Sales Ops Manager) who defines the rules. They say, "A 'Sales Lead' is only valid if it has a phone number."

  • Data Custodian (You/Data Engineer): The technical person who implements these rules in the database and pipelines.

B. Process (The Rules)

  • Policies: "All customer data must be encrypted at rest."

  • Standards: "All dates must be stored in ISO 8601 format (YYYY-MM-DD)."

  • Workflows: "If a schema changes, it must be approved by the Governance Council."

C. Technology (The Tools)

  • Data Catalog: A "Search Engine" for your company's data (e.g., Alation, Collibra, DataHub). It tells you where data is and what it means.

  • Data Lineage: Maps showing how data flows from Source A → Transformation B → Dashboard C.

  • Access Control: Systems (like RBAC in Snowflake) that enforce who can see what.

Governance vs. Data Management

This is a common interview question. The easiest way to remember it is Strategy vs. Execution.

Feature

Data Governance

Data Management

Role

The Architect / Lawmaker

The Builder / Contractor

Focus

Strategy, Policy, & Rules

Execution, Implementation, & Operations

Analogy

Designing the blueprints & safety codes.

Pouring the concrete & laying the pipes.

Example

Deciding that "User IDs must be unique."

Writing the SQL DISTINCT query to enforce it.

What this means for you as a Data Engineer

As a Data Engineer, you are often the enforcer of governance. You don't just move data; you build the "guardrails" that the Governance team requests.

  • Access Control: You write the Terraform or SQL scripts to grant READ access to Analysts but MASK the credit card columns.

  • Quality Checks: You implement the "Data Contracts" (as we discussed previously) to ensure data meets the standards defined by the Data Stewards.

  • Tagging: You apply tags like sensitive_pii to tables in Snowflake so the Data Catalog can automatically flag them.

Summary

Data Governance is not about "policing" people to slow them down; it is about creating a safe, organized environment where people can find high-quality data quickly without breaking laws or crashing systems.


Last updated