# Business Metrics and Terminology

***

## Business Metrics

While industry context is king, there is a core vocabulary of business metrics and concepts that allows a Data Engineer to move from "building pipelines" to "**driving value.**" Understanding these helps you design better data models, prioritize the right tables, and **communicate effectively with stakeholders.**

Here are the key categories of business terminology a Data Engineer should be comfortable with.

### **User & Growth Metrics**

These are often the "heartbeat" numbers for B2C, SaaS, or product-focused companies. You will frequently be asked to build pipelines that populate dashboards tracking these.

* **DAU / MAU (Daily/Monthly Active Users)**: The count of unique users who engage with a product within a specific window.
  * DE Context: You need to define what "active" means in the ETL logic. Is it a login? A click? A specific API call?

* **Churn Rate**: The percentage of customers who stop using the product or service within a given period.
  * DE Context: This often requires complex historical data modeling (Snapshot tables) to track who was active last month but isn't this month.

* **CAC (Customer Acquisition Cost)**: The cost to acquire a new paying customer (Marketing spend / # New Customers).
  * DE Context: This usually requires joining data from marketing platforms (Ads, Social) with internal transactional databases.

* **LTV (Lifetime Value) / CLV**: The total revenue a business expects from a single customer account.
  * DE Context: Requires aggregating a user's entire purchase history, often spanning years.

* **Cohorts**: Grouping users by a shared characteristic, usually their start date (e.g., "Jan 2024 Signups").
  * DE Context: You will often need to structure data to allow for "Cohort Analysis" so analysts can track how a specific group behaves over time.

### **Financial Metrics**

Even if you aren't in Fintech, you will handle revenue data. Precision here is non-negotiable.

* **ARR / MRR (Annual/Monthly Recurring Revenue)**: Normalized revenue numbers for subscription businesses.
* **Gross Margin**: Revenue minus the Cost of Goods Sold (COGS).
* **Fiscal Year (FY)**: The financial operating year, which often differs from the calendar year.
  * DE Context: Your date dimension tables must support fiscal quarters and years, or financial reports will be wrong.
* **EBITDA**: Earnings Before Interest, Taxes, Depreciation, and Amortization. A proxy for cash flow and operational profitability.

### **Strategic & Operational Frameworks**

These terms describe how the business measures its own success.

* **KPI (Key Performance Indicator)**: A quantifiable measure used to evaluate success in meeting objectives.
  * DE Context: If a KPI is "Reduce Latency," your pipeline metadata becomes the product.
* **OKR (Objectives and Key Results)**: A goal-setting framework.
  * DE Context: Engineering teams often have OKRs like "Migrate 100% of legacy data to the Warehouse," linking your technical work to business goals.
* **ROI (Return on Investment)**: The ratio of net profit to the cost of investment.
  * DE Context: You may need to justify a new tool (e.g., Snowflake vs. Redshift) by calculating the ROI of the migration.

### **Data Service Level Terms**

These are business terms specifically regarding the "product" you provide (the data).

* **SLA (Service Level Agreement)**: A formal commitment between a service provider and a client.
  * DE Context: "The `orders` table will be updated by 8:00 AM every morning."
* **SLO (Service Level Objective)**: The internal goal you aim for to meet the SLA (e.g., "99.9% pipeline uptime").
* **Time to Insight**: How long it takes from a data event happening (e.g., a sale) to it being visible in a dashboard.

### **Compliance & Governance**

Ignorance here can be a legal liability for the business.

* **PII (Personally Identifiable Information)**: Any data that can identify a specific individual (email, phone, SSN).
  * DE Context: You must know how to hash, mask, or segregate this data in the warehouse.
* **GDPR / CCPA**: European and Californian privacy laws.
  * DE Context: You need to build "Right to be Forgotten" pipelines that can mechanically delete a user's data across all systems upon request.

### **Summary Table**

| **Category** | **Term** | **Why a DE Cares**                                                                   |
| ------------ | -------- | ------------------------------------------------------------------------------------ |
| **Growth**   | Churn    | Requires "snapshotting" data to compare past vs. present states.                     |
| **Finance**  | ARR      | Revenue calculations require absolute precision and strict types (Decimal vs Float). |
| **Ops**      | SLA      | Dictates your on-call schedule and pipeline scheduling priority.                     |
| **Legal**    | PII      | Affects how you design access controls and data masking logic.                       |

***

## Examples

Translating business concepts into SQL is effectively the day-to-day job of a Data Engineer. You act as the translator between "Business English" and "Database Logic."

Here is how those five categories translate into actual SQL patterns you might implement in a data warehouse (like BigQuery, Snowflake, or PostgreSQL).

#### User & Growth: Calculating Churn

The Business Ask: "How many users did we lose last month?"

The Data Engineering Translation: You need to find users who were present in the *previous* period but are missing in the *current* period. This usually involves a `LEFT JOIN` or `EXISTS` logic.

```sql
WITH last_month_active AS (
    SELECT DISTINCT user_id
    FROM user_activity
    WHERE activity_date BETWEEN '2023-12-01' AND '2023-12-31'
),
current_month_active AS (
    SELECT DISTINCT user_id
    FROM user_activity
    WHERE activity_date BETWEEN '2024-01-01' AND '2024-01-31'
)

SELECT
    COUNT(prev.user_id) AS total_users_last_month,
    COUNT(prev.user_id) - COUNT(curr.user_id) AS churned_users,
    -- Cast to float to avoid integer division resulting in 0
    ROUND(
       (COUNT(prev.user_id) - COUNT(curr.user_id))::FLOAT / NULLIF(COUNT(prev.user_id), 0) * 100,
       2
    ) AS churn_rate_percent
FROM last_month_active prev
LEFT JOIN current_month_active curr
    ON prev.user_id = curr.user_id
-- We only want rows where the join failed (user exists in prev, but is NULL in curr)
WHERE curr.user_id IS NULL;
```

#### Financial Metrics: Fiscal Year Logic

The Business Ask: "Give me the revenue report for Q1 of Fiscal Year 2024."

The Data Engineering Translation: Companies often have Fiscal Years (FY) that don't match the Calendar Year (e.g., FY starts in April). You cannot rely on standard `YEAR()` functions; you must build custom logic or a Date Dimension.

```sql
SELECT
    DATE_TRUNC('month', transaction_date) as sales_month,
    -- Logic: If month is Jan(1), Feb(2), or Mar(3), it belongs to previous FY
    -- Otherwise, it belongs to the current calendar year's FY
    CASE
        WHEN EXTRACT(MONTH FROM transaction_date) < 4
        THEN EXTRACT(YEAR FROM transaction_date) - 1
        ELSE EXTRACT(YEAR FROM transaction_date)
    END AS fiscal_year,
    SUM(amount) AS mrr
FROM subscriptions
GROUP BY 1, 2
ORDER BY 1 DESC;
```

*Note: In a real production environment, you would join this against a `dim_date` table rather than calculating it on the fly every time.*

#### Strategic & Operational: Funnel Conversion (KPI)

The Business Ask: "Where are users dropping off in the signup process?"

The Data Engineering Translation: This requires "flagging" users as they pass through specific gates and aggregating the results.

```sql
SELECT
    COUNT(DISTINCT session_id) AS total_visitors,
    -- Step 1: Viewed Product
    COUNT(DISTINCT CASE WHEN event_name = 'view_item' THEN session_id END) AS viewed_product,
    -- Step 2: Added to Cart
    COUNT(DISTINCT CASE WHEN event_name = 'add_to_cart' THEN session_id END) AS added_to_cart,
    -- Step 3: Purchased
    COUNT(DISTINCT CASE WHEN event_name = 'purchase' THEN session_id END) AS purchased,

    -- Calculate specific KPI: Cart-to-Purchase Conversion Rate
    (COUNT(DISTINCT CASE WHEN event_name = 'purchase' THEN session_id END)::FLOAT /
     NULLIF(COUNT(DISTINCT CASE WHEN event_name = 'add_to_cart' THEN session_id END), 0))
     AS cart_completion_rate
FROM web_events
WHERE event_date >= CURRENT_DATE - INTERVAL '30 days';
```

#### Data Service Level: Checking "Freshness" (SLA)

The Business Ask: "Is the dashboard up to date?"

The Data Engineering Translation: You need to check the difference between the "current time" and the "maximum timestamp" of your data. If the gap is too large, you alert the team.

```sql
SELECT
    'orders_table' AS table_name,
    MAX(loaded_at) AS last_ingestion_time,
    NOW() AS current_check_time,
    -- Calculate lag in minutes
    EXTRACT(EPOCH FROM (NOW() - MAX(loaded_at))) / 60 AS minutes_since_last_load,
    CASE
        -- SLA: Data must be fresher than 60 minutes
        WHEN EXTRACT(EPOCH FROM (NOW() - MAX(loaded_at))) / 60 > 60 THEN 'BREACH'
        ELSE 'HEALTHY'
    END AS sla_status
FROM raw.orders;
```

#### Compliance: PII Masking (GDPR)

The Business Ask: "Analytics teams need user data, but they cannot see email addresses or phone numbers."

The Data Engineering Translation: You create a "View" that exposes the data but hashes or masks the sensitive columns.

```sql
CREATE OR REPLACE VIEW analytics.users_safe AS
SELECT
    user_id,
    -- Hashing: Allows joining on email (e.g. to other systems) without seeing the email
    SHA256(email) AS email_hash,
    -- Masking: Shows domain only for analysis, hides the user
    CONCAT('*****@', SUBSTRING(email FROM POSITION('@' IN email) + 1)) AS masked_email,
    country,
    signup_date
FROM raw.users;
```

***