# Python

***

[Testing and debugging](https://www.freecodecamp.org/news/python-debugging-handbook/)

Testing Pyspark:

* <https://www.confessionsofadataguy.com/introduction-to-unit-testing-with-pyspark/>
* [test PySpark code with Pytest](https://www.startdataengineering.com/post/test-pyspark/)

***

How to add tests to your data pipeline:&#x20;

* <https://www.startdataengineering.com/post/how-to-add-tests-to-your-data-pipeline/>

How to add integration tests to your data pipeline:&#x20;

* <https://www.startdataengineering.com/post/python-datapipeline-integration-test/>

***

### Pytest

It is the industry standard because it’s concise, powerful, and scales from simple scripts to complex data pipelines.

Here is the blueprint for mastering modern unit testing in Python.

***

#### The Core Philosophy

Unlike the standard library, `pytest` allows you to write tests as simple functions rather than forcing you into classes. It also features a "smart assert" that gives you a detailed breakdown of why a test failed without requiring special method calls like `self.assertEqual`.

**The Basic Syntax**

To get started, simply name your file `test_*.py` and use the standard `assert` keyword.

```python
# math_logic.py
def add(a, b):
    return a + b

# test_math_logic.py
def test_add():
    assert add(1, 2) == 3
```

***

#### Testing Logic with Parameterization

Instead of writing ten different functions to test different inputs, modern testing uses parameterization. This keeps your code DRY (Don't Repeat Yourself) and makes it easy to add edge cases.

```python
import pytest

@pytest.mark.parametrize("input_a, input_b, expected", [
    (1, 2, 3),
    (-1, 1, 0),
    (0, 0, 0),
    (100, 200, 300),
])
def test_add_multiple_cases(input_a, input_b, expected):
    assert add(input_a, input_b) == expected
```

***

#### Managing State with Fixtures

Fixtures are the "modern" way to handle setup and teardown. If you need a database connection, a mock spark session, or a sample dataset, you define it as a fixture.

```python
import pytest

@pytest.fixture
def sample_data():
    return {"id": 1, "status": "active", "value": 100}

def test_process_data(sample_data):
    # The fixture is automatically injected into the function
    assert sample_data["status"] == "active"
```

***

#### Modern Mocking (The `mocker` pattern)

In the past, we used `unittest.mock.patch` as a decorator or context manager, which often led to "indentation hell." The modern way is to use the `pytest-mock` plugin, which provides a `mocker` fixture.

* **Why Mock?** To isolate your code from external dependencies (API calls, file systems, or databases).

```python
def get_user_from_api(api_client):
    return api_client.get_user_name(1)

def test_get_user_from_api(mocker):
    # Create a mock object
    mock_client = mocker.Mock()
    mock_client.get_user_name.return_value = "John Doe"
    
    result = get_user_from_api(mock_client)
    
    assert result == "John Doe"
    mock_client.get_user_name.assert_called_once_with(1)
```

***

#### Essential Tooling for Efficiency

To be truly efficient, you should integrate these three tools into your workflow:

| **Tool**         | **Purpose**                                            | **Command**               |
| ---------------- | ------------------------------------------------------ | ------------------------- |
| **pytest-cov**   | Measures how much of your code is actually tested.     | `pytest --cov=my_project` |
| **pytest-xdist** | Runs tests in parallel (crucial for large suites).     | `pytest -n auto`          |
| **Hypothesis**   | Property-based testing (generates edge cases for you). | *Advanced use*            |

***

#### How to Run Your Tests

Simply type `pytest` in your terminal at the root of your project. It will auto-discover all files starting with `test_` and run them.

> **Pro Tip:** Use `pytest -vv` (very verbose) to see exactly what went wrong during a failure.

***

### Example

To demonstrate how `pytest` handles a multi-file architecture, we’ll build a mini **Data Processor**. This structure is common in modern Python development: one file for models, one for logic, and one for external integration.

#### The Project Structure

A clean project separates source code (`src`) from tests. `pytest` will automatically find any file starting with `test_` and any function starting with `test_`.

```
my_project/
├── src/
│   ├── models.py       # Data structures
│   ├── processor.py    # Business logic
│   └── validator.py    # Validation rules
└── tests/
    ├── test_processor.py
    └── test_validator.py
```

***

#### The Source Code (`src/`)

**`src/models.py`**

Simple data container using a Dataclass.

```python
from dataclasses import dataclass

@dataclass
class Transaction:
    id: int
    amount: float
    currency: str
```

**`src/validator.py`**

Contains specific rules. We want to test these individually.

```python
def is_valid_amount(amount: float) -> bool:
    return amount > 0 and amount < 1_000_000
```

**`src/processor.py`**

The "Brain" that coordinates the other files.

```python
from src.validator import is_valid_amount

def calculate_total(transactions: list) -> float:
    total = 0.0
    for tx in transactions:
        if is_valid_amount(tx.amount):
            total += tx.amount
    return total
```

***

#### The Test Suite (`tests/`)

**`tests/test_validator.py`**

Here we use **Parameterization** to test multiple edge cases in one function.

```python
import pytest
from src.validator import is_valid_amount

@pytest.mark.parametrize("amount, expected", [
    (50.0, True),
    (-1.0, False),
    (0, False),
    (1_000_001, False)
])
def test_is_valid_amount(amount, expected):
    assert is_valid_amount(amount) == expected
```

#### `tests/test_processor.py`

Here we use a **Fixture** to provide a clean list of transactions for every test.

```python
import pytest
from src.models import Transaction
from src.processor import calculate_total

@pytest.fixture
def mock_transactions():
    """Provides a standard set of data for testing."""
    return [
        Transaction(id=1, amount=100.0, currency="USD"),
        Transaction(id=2, amount=200.0, currency="USD"),
        Transaction(id=3, amount=-50.0, currency="USD"), # Invalid amount
    ]

def test_calculate_total_filters_invalid(mock_transactions):
    # The logic should ignore the -50.0 transaction
    result = calculate_total(mock_transactions)
    assert result == 300.0
```

***

#### **Running the Tests**

To run these, navigate to the `my_project` folder and run:

```bash
pytest -v
```

**Why this is "Modern & Efficient":**

1. **Isolation:** If `validator.py` breaks, `test_validator.py` will fail immediately, telling you exactly where the bug is before you even look at the complex processor logic.
2. **No Classes:** We didn't write `class TestSomething(unittest.TestCase)`. Pure functions are faster to write and easier to read.
3. **Fixtures:** The `mock_transactions` fixture can be reused across 100 different test files if we move it to a special file called `conftest.py`.

***
