Analytics Development Lifecycle (ADLC)


The Analytics Development Lifecycle (ADLC) is a methodology that brings the rigor of software engineering (SDLC) to the world of data and analytics. The infinity loop illustrates that analytics isn't a "one-and-done" project but a continuous cycle of improvement.

The loop is split into two halves: the DATA side (focused on building) and the OPS side (focused on running and learning).


The DATA Side: Building the Foundation

This half of the loop is where the "heavy lifting" of engineering and development happens.

  • Plan: This is the requirement-gathering phase. You define the business questions you're trying to answer, identify the necessary data sources, and determine the logic needed for the final output (metrics, dimensions, etc.).

  • Develop: Here, you write the code. In a modern stack, this usually involves SQL or Python transformations (e.g., using dbt). You build the models that turn raw data into clean, structured tables.

  • Test: Before the data reaches an end-user, you must verify it. This includes unit tests for your code logic and data quality tests to ensure there are no nulls, duplicates, or unexpected values.

  • Deploy: Once the code is tested and reviewed, it is "shipped" to production. This often involves CI/CD (Continuous Integration/Continuous Deployment) pipelines that move your code from a development environment to the live warehouse.


The OPS Side: Maintaining and Insights

Once the data is "live," the focus shifts to ensuring it stays healthy and provides value.

  • Operate: This involves orchestration—scheduling your jobs to run at the right time (e.g., using Airflow or Dagster) and ensuring the infrastructure is scaling properly to handle the workload.

  • Observe: This is about "Data Observability." You monitor the health of the pipelines. If a source table fails to update or a metric looks "off," observability tools alert the team before the business makes a bad decision based on stale data.

  • Discover: This is the bridge back to the business. Stakeholders and analysts explore the clean data to find patterns or answer the questions defined in the "Plan" phase.

  • Analyze: You evaluate the results. Did the data product solve the original problem? This analysis often uncovers new questions or gaps in the data, which triggers the next Plan phase, starting the loop all over again.


Why the "Infinity" Loop Matters

The goal of this cycle is to eliminate "Data Silos" and "Trust Gaps." By following this loop:

  • Velocity increases: Small, incremental changes are easier to test and deploy than massive, quarterly updates.

  • Quality improves: Automated testing and observation catch errors early.

  • Alignment: The "Analyze" phase ensures that what you build actually meets the business need.


Mapping ADLC to specific tools

To map the Analytics Development Lifecycle (ADLC) to specific tools, we look at the Modern Data Stack (MDS). Each stage of the loop has specialized software designed to handle that specific transition.

Here is how the tools typically fit into the hierarchy:

The "DATA" Side (Building)

Stage

Common Tools

Role of the Tool

Plan

Jira, Linear, GitHub Issues

Managing requirements, defining "Data Contracts," and tracking the sprint.

Develop

dbt (Data Build Tool), SQL Mesh, Python

Writing the transformation logic. dbt is the industry standard for turning SQL into modular code.

Test

dbt tests, Great Expectations, Soda

Running "Data Quality" checks (e.g., "is this column unique?") and unit tests on logic.

Deploy

GitHub Actions, GitLab CI/CD

Automating the movement of code from dev to production once tests pass.


The "OPS" Side (Running)

Stage

Common Tools

Role of the Tool

Operate

Airflow, Dagster, Kestra, Prefect

The Orchestrators. They handle the "when" and "how" of running the jobs (e.g., "Run dbt after Fivetran finishes").

Observe

Monte Carlo, Elementary, Bigeye

Data Observability. These tools alert you if volume drops significantly or if data looks "weird" (anomalies).

Discover

Atlan, Select Star, CastorDoc

Data Catalogs. These help users find what data exists, who owns it, and where it came from (lineage).

Analyze

Tableau, PowerBI, Metabase, Evidence

The BI/Visualization layer where the business finally consumes the data and gets answers.


The "Center" of the Loop: The Data Warehouse

While not a "stage," the entire loop rotates around your storage and compute layer.

  • Snowflake, BigQuery, or DuckDB: This is where the actual data lives while it moves through the ADLC stages.

The ADLC in Action (Example)

  1. Plan: You use Jira to track a request for a new "Churn Rate" metric.

  2. Develop: You write the SQL transformation in dbt.

  3. Test: Great Expectations confirms that the "Churn Rate" is never a negative number.

  4. Deploy: GitHub Actions merges your code into the production branch.

  5. Operate: Airflow triggers the dbt job every morning at 8:00 AM.

  6. Observe: Monte Carlo sends a Slack alert if the source data from the CRM fails to load.

  7. Discover: An Analyst finds the new table in Atlan by searching for "Churn."

  8. Analyze: The CEO views the final chart in Metabase and decides on a new retention strategy.


Last updated