# Gitlab CI/CD

***

Gitlab CI/CD docs: <https://docs.gitlab.com/ci/>&#x20;

Validate syntax of your configuration: <https://docs.gitlab.com/ci/yaml/lint/>

Predefined Variables (they are available in every Gitlab CI/CD pipeline): <https://docs.gitlab.com/ci/variables/predefined_variables/>

Gitlab CI/CD keywords for YAML configuration files: <https://docs.gitlab.com/ci/yaml/>

CI/CD component examples: <https://docs.gitlab.com/ci/components/examples/>

CI/CD inputs: <https://docs.gitlab.com/ci/inputs/>

[Gitlab CI/CD Security ](https://university.gitlab.com/learn/learning-path/gitlab-security-essentials-v100)course from Gitlab University

***

### What Gitlab CI/CD is

GitLab CI/CD is GitLab’s built-in **continuous integration**, **delivery**, and **deployment** system. It automates:

* Building code
* Running tests
* Packaging artifacts
* Deploying to environments
* Ensuring quality gates / approvals

It is configured through a single file inside the repo:

```
.gitlab-ci.yml
```

GitLab CI/CD is tightly integrated: repo → merge requests → pipelines → environments → deployments → observability.

***

### 🏗️ **GitLab CI/CD Architecture (High-Level)**

GitLab’s CI/CD architecture consists of **five main components**:

#### **GitLab Server (Coordinator)**

This includes GitLab Rails/Workhorse/Gitaly.\
It is responsible for:

* Parsing `.gitlab-ci.yml`
* Creating pipeline DAGs (jobs, stages, rules)
* Storing pipeline metadata
* Authenticating/authorizing runners
* Scheduling CI jobs to available runners
* Tracking job logs, statuses, artifacts

Think: **brain of the CI system**.

***

#### **GitLab Runners**

Runners are the **compute nodes** that actually execute jobs.

They can be:

* **Shared runners** – provided by GitLab (SaaS) or by your company
* **Project/group runners** – assigned to specific areas
* **Specific runners** – dedicated to one project
* **Ephemeral runners** – auto-scaled on cloud VMs or Kubernetes

Each runner is an agent registered with the GitLab coordinator.

{% embed url="<https://claude.ai/public/artifacts/a3687c3a-15d6-4ff5-9498-4d0bb9bebf5f>" %}

Think of your **.gitlab-ci.yml** file as the blueprint. Runners are the machines that carry out the work. When a pipeline is triggered, available runners check in with GitLab to pick up jobs that perform various tasks, like running tests, building apps, or deploying changes.

GitLab’s runner system includes:

1. **GitLab Runner** **(the software)** This is the application/program you actually install on a server or machine. Think of it as the "engine" - it's the binary executable that sits on your infrastructure waiting for work to do.
2. **Runners (the agents)** These are the configured instances or "workers" that the GitLab Runner software manages. Each runner is registered with your GitLab instance and can execute CI/CD pipeline jobs.

Each runner runs inside an environment defined by an executor like Docker or Shell.

#### Potential issues with Runners and their troubleshooting

When creating job definitions in your .gitlab-ci.yml file, you have the ability to specify which runners can execute those jobs. This capability is essential for guaranteeing that jobs execute in appropriate environments—with the necessary permissions and resources available.

**Why Runner Selection Matters**

1. Some jobs may require specific environments or resources.
2. You may want to reserve certain runners for specific job types.
3. Security requirements may limit which runners can be used.

<details>

<summary>Runner selection in pipelines</summary>

* The runner's availability and access level
* Tags assigned to runners
* Protected runners for sensitive operations

**Runner Availability** GitLab follows a specific hierarchy when selecting runners: it checks project-level runners first, then group-level runners (along with any parent groups), and finally instance-level runners. This ordering means more specific runners take precedence, giving you tailored environments where they matter most.

A typical CI/CD organization might use instance-level runners for standard microservices to minimize upkeep, while reserving project-specific runners for sensitive operations like payment processing. This strategy provides a good balance between convenience and security.

Instance-wide runners simplify administrative work, whereas project-dedicated runners can handle high-priority operations. Most teams can adopt this approach without changing their existing pipeline definitions.

**Using Runner Tags** Tags function as descriptive labels on runners, indicating what they're equipped to handle—for instance, 'android' or 'xcode'. You can use these tags to direct jobs toward runners with the necessary capabilities, guaranteeing that builds happen in appropriate environments.

Consider a mobile development team at a CI/CD-enabled company: they use tags to route iOS builds to runners with Xcode and Android builds to runners with the Android SDK. This precision reduces configuration mistakes and makes better use of available resources.

Tags ensure jobs only execute where the required tooling exists. Teams gain better environment separation, and there's no risk of untagged runners accidentally claiming incompatible jobs.

**Using Protected Runners** Protected runners only accept jobs from protected branches and tags, making them perfect for production pipelines. This restriction guarantees that only verified code reaches your live systems.

A company might configure a protected runner specifically for production releases to their Kubernetes infrastructure. This creates an additional security boundary around deployment operations.

With protected runners, only approved branches can initiate deployments. Sensitive credentials stay contained, and you can require manual sign-off as an extra safeguard for production changes.

Runner selection gives you precise control over the location, method, and circumstances under which your pipeline tasks execute.

</details>

<details>

<summary>Best practices for runner selection</summary>

* **Use specific tags to match jobs with appropriate environments.** Example: Apply labels such as docker, linux, or android to ensure proper routing.
* **Establish consistent tagging conventions across your organization.** Tip: Document your approved tags in a central location like a wiki or README for team reference.
* **Reserve protected runners for release pipelines.** Benefit: Keeps sensitive credentials separate and restricts who can trigger critical deployments.
* **Avoid over-tagging your jobs.** Best practice: Only specify the essential tags required for execution to prevent jobs from becoming unassignable.
* **Manage resource contention strategically when runner capacity is constrained.** Strategy: Configure less critical jobs as interruptible so they can be preempted by high-priority work.

</details>

Diagnosing runner problems is a crucial skill for CI/CD practitioners. When jobs behave unexpectedly, recognizing typical runner-related issues enables quicker and more assured responses. If a job fails to launch or encounters unexpected failures, the underlying issue could stem from runner setup or connectivity challenges.

<details>

<summary><strong>Common Runner-related Issues</strong></summary>

**Job Stuck in Pending Status** When jobs get stuck showing "pending," you'll often encounter messages such as: "Waiting for an available runner…". This situation usually occurs because no runners match the specified tags, the job's tags don't align with any configured runners, or all suitable runners are currently occupied with other tasks.

In one case, a mobile development team inadvertently deleted the android tag from an important runner. Their build queue grew until they noticed and corrected the tag configuration.

**Runner Connection Failures** Connection-related failures happen when a job begins but then stops due to connectivity issues. These problems can result from network interruptions between the runner and GitLab, runners going offline while executing jobs, or brief infrastructure outages.

A company experienced this when a self-hosted runner in their staging setup terminated unexpectedly during a job because of an incorrectly configured restart policy.

**Resource Limitations** Jobs may fail with messages like "out of memory" or "process killed" when resource constraints are hit. This occurs when jobs demand more RAM or CPU than the runner provides, Docker executors exhaust available disk space, or build scripts spawn excessive parallel processes.

A test suite for a product recommendation system repeatedly failed until the team restructured it into smaller, more manageable test segments.

</details>

**Identifying Runner Issues in Job Logs**

When jobs fail in GitLab CI/CD, the job log provides your primary diagnostic resource. GitLab organizes logs into distinct sections that make it easier to determine whether issues stem from runner setup, network problems, or resource availability.

* **Job Start Section** This area displays the GitLab Runner version and identifies which runner accepted the job. Verify that the correct runner was selected—particularly important when jobs need specific capabilities (such as Docker or shell executors).

<figure><img src="https://2332658533-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FG5fhKjYnbaQlTPTcaO85%2Fuploads%2F6e0iK2sxBIpp4JkL1o9h%2Frunner-logs.png?alt=media&#x26;token=f6932d65-d6be-4ae1-8346-0801654e46de" alt=""><figcaption></figcaption></figure>

* **Preparation Section** This segment reveals how the runner configures the job environment. Failures in this phase often point to problems downloading images, setting up executors, or retrieving secured credentials.

<figure><img src="https://2332658533-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FG5fhKjYnbaQlTPTcaO85%2Fuploads%2FenVexvpWbq8QkpXwluLR%2Frunner-logs-1.png?alt=media&#x26;token=9370b302-7320-4f32-8d10-f9d17e82341c" alt=""><figcaption></figcaption></figure>

* **Script Execution** Look for system-level errors that indicate runner problems, such as:
  * Cannot allocate memory → Insufficient RAM on the runner
  * Connection reset by peer → Network connectivity loss
  * No space left on device → Exhausted disk capacity on the runner

These represent infrastructure limitations rather than flaws in your job's script logic.

Learning to read job logs effectively is critical for rapid runner troubleshooting, reducing delays, and maintaining development momentum.

***

#### **Executors**

**The relationship between Runner and Executor:**

{% embed url="<https://claude.ai/public/artifacts/29cd819e-ee47-4d34-b3b6-79ebaf3e728c>" %}

While a **runner** picks up CI/CD jobs from GitLab, the **executor** determines *how* and *where* those jobs are run. Runners support many executors:

| Executor           | Description                               |
| ------------------ | ----------------------------------------- |
| **Shell**          | Runs directly on machine (fast, insecure) |
| **Docker**         | Most common; runs jobs in containers      |
| **Docker Machine** | Auto-scales VMs                           |
| **Kubernetes**     | One pod per job; cloud-native             |
| **Custom**         | Any custom environment                    |
| **SSH**            | Executes commands on remote hosts         |

Executors are critical because they determine isolation, scale, and portability.

<details>

<summary>A little more information about some Executor types:</summary>

**Shell Executor** Commands execute directly on the host system with the Shell executor, offering a simple approach for tasks needing direct server access. A company may leverage this on their secure deployment infrastructure when releasing production updates.

When you need straightforward execution and direct host system interaction, this executor type works well.

**Docker Executor** Each job gets its own fresh container when using the Docker executor, which isolates tasks from one another. A company may find this particularly valuable for frontend builds and testing, since it prevents jobs from interfering with each other.

Container-based execution provides both stronger security boundaries and more predictable behavior throughout the development workflow.

**Kubernetes Executor** Jobs run inside individual pods with the Kubernetes executor, which makes it great for scaling in cloud environments. A company may use this approach for their mobile builds, where they need to handle many simultaneous jobs efficiently.

Teams operating in cloud-native setups benefit from this executor's ability to dynamically manage resources and performance.

**SSH Executor** Remote command execution happens through SSH connections with this executor, giving you flexibility to work with distributed systems. A company may  find it useful for their older infrastructure during a backend system migration.

This option helps bridge the gap when you're working with established systems while moving toward newer architectures.

</details>

***

#### **Pipelines → Stages → Jobs DAG**

A pipeline is a graph made of:

* **Pipeline** (full execution)
* **Stages** (sequential blocks, e.g. build → test → deploy)
* **Jobs** (individual tasks)
* **Needs/DAG** (modern approach, jobs run in parallel when dependencies are met)

Example DAG:

```
build → test → deploy
```

Or with DAG:

```
build
├─ test-unit
└─ test-api
deploy depends on both
```

***

#### **Artifacts, Packages, Environments**

GitLab CI/CD automatically manages outputs:

* **Artifacts** – files produced in jobs (binaries, logs, reports)
* **Cache** – dependency caches between jobs
* **Environment deployments** – dev/staging/prod
* **Releases & packages** – container registry, package registry

***

### 🔁 **How a Pipeline Runs (Execution Flow)**

#### **1. Developer pushes code or opens MR**

Triggers pipeline based on rules:

* `on push`
* `on merge request`
* `schedule`
* `manual`
* `webhook`

#### **2. GitLab reads `.gitlab-ci.yml`**

It parses:

* stages
* jobs
* variables
* rules
* dependencies

Generates a **directed acyclic graph (DAG)**.

#### **3. Jobs wait in the queue**

GitLab Coordinator places all pending jobs into queue by tag.

#### **4. Runners pull jobs**

Runners use a pull model:

```
Runner → "Any jobs for me?"
GitLab → "Yes, run job XYZ"
```

They match using:

* Tags (`docker`, `k8s`, `gpu`, `linux`)
* Runner assignments
* Resource permissions

#### **5. Runner executes the job**

Based on executor configuration.

Typical steps:

1. Checkout source code
2. Restore caches
3. Run job script
4. Save artifacts
5. Upload logs

#### **6. GitLab updates pipeline + MR status**

GitLab shows:

* Success
* Failure
* Skipped
* Manual action required

#### **7. (Optional) Deployments + Observability**

GitLab can:

* Deploy to Kubernetes
* Create an environment URL
* Track deployments via GitLab Deployments API
* Integrate with Metrics/Tracing

***

### 🧩 **Key Concepts of GitLab CI/CD**

Here are the essential pieces:

#### **.gitlab-ci.yml**

Example minimal pipeline:

```yaml
stages:
  - build
  - test

build_app:
  stage: build
  script: npm install && npm run build
  artifacts:
    paths: [dist/]

test_app:
  stage: test
  script: npm test
```

#### **Tags**

Used for routing jobs:

```yaml
tags:
  - docker
  - medium-runner
```

#### **Variables**

Pipeline-level or job-level:

```yaml
variables:
  NODE_ENV: production
```

#### **Rules**

Modern conditional logic:

```yaml
rules:
  - if: '$CI_COMMIT_BRANCH == "main"'
  - if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
```

#### **Artifacts & Cache**

Saves files between jobs and stages. TechStart’s build job saves a /public folder that later jobs reuse.

Artifacts = persist between stages\
Cache = speed up builds

#### Dependencies

***

### 📦 **Putting It All Together: Full Architecture Diagram (Text)**

```
                          ┌─────────────────────┐
                          │  GitLab Rails API   │
                          │ (Coordinator)       │
                          └─────────┬───────────┘
                                    │
                          Schedules Jobs
                                    │
                        ┌───────────▼───────────┐
                        │  CI Job Queue          │
                        └───────────┬───────────┘
                                    │ Pull jobs
                         ┌──────────▼────────────┐
                         │      Runners           │
                         │ (Docker/K8s/Shell)     │
                         └──────────┬────────────┘
                                    │ Executes
                         ┌──────────▼────────────┐
                         │      Executors         │
                         │  (Container, Pod, VM)  │
                         └──────────┬────────────┘
                                    │ Produces
                      ┌─────────────▼────────────────┐
                      │ Artifacts / Logs / Reports    │
                      └─────────────┬────────────────┘
                                    │
                                    ▼
                          GitLab UI / MR status
```

***

### 🧠 **When to Use GitLab CI/CD**

GitLab shines if you want:

* A unified Git+CI system
* Strong merge request workflows
* Kubernetes-native deployments
* Built-in security scanning (SAST, DAST, dependency, secret detection)
* Self-hosted + multi-cloud flexibility
* Complex DAG pipelines

It’s extremely popular for DevOps teams that want an all-in-one platform.

***

### Security practices on Gitlab

<https://docs.gitlab.com/user/application_security/get-started-security/>

***

### **Pipeline Types**

[**https://docs.gitlab.com/ci/pipelines/pipeline\_types/**](https://docs.gitlab.com/ci/pipelines/pipeline_types/)

**Parent–Child Pipelines**

A single main pipeline can trigger several smaller pipelines to run in parallel.\
For example, this becomes useful when a team breaks a large application into microservices, with each service having its own testing pipeline.

**Multi-Project Pipelines**

These pipelines span multiple repositories or projects and allow coordinated workflows.\
A common use case is when an organization introduces an additional service—such as a new analytics component—and wants deployments across both codebases to be synchronized.

**Merge Request Pipelines**

These run automatically whenever changes are pushed to a merge request.\
Teams often use them to speed up code reviews and detect bugs earlier in the development cycle.

**Merge Trains**

Merge trains queue and merge multiple merge requests safely and in a controlled order.\
This is especially helpful for teams where several developers push changes around the same time and want to avoid integration conflicts.

***

### **What is the include keyword?**

GitLab allows you to modularize and share pipeline configurations using the include keyword.

This enables you to:

* Eliminate duplicate logic across different files
* Distribute pipeline components among multiple projects
* Maintain templates from a single location for simpler updates

When you apply include, GitLab combines the referenced external YAML with your `.gitlab-ci.yml` during pipeline execution.

**Before using include (if you repeat the same or similar configuration every project):**

```
test:
  stage: test
  script:
    - echo "Running tests..."
```

**An reused with include**:

```
include:
  - local: '/ci/test-jobs.yml'
```

This approach lets you define a job a single time—then apply it across numerous projects—without duplicating the same YAML repeatedly. Let's examine the various methods for including files from different sources.

**include: local**

Use this method to reference a YAML file within the same repository:

```
include:
  - local: '/ci/test-jobs.yml'
```

When your /ci/test-jobs.yml contains:

```
test:
  script:
    - echo 'Running tests...'
```

The test job will execute alongside any jobs specified in your primary `.gitlab-ci.yml` file.

*Tip:* GitLab's Pipelines section displays the fully expanded YAML configuration. This view helps with debugging or understanding how your included files merged into the complete pipeline definition.

**include:project**

This method lets you reference YAML files from different GitLab projects within your instance. It's valuable when distributing configurations among several projects.

```
include:
  - project: 'create-group/ci-config'
    ref: main
    file: '/ci/test-jobs.yml'
```

Here, the test job from `/ci/test-jobs.yml` in the c`reate-group/ci-config` repository's main branch gets incorporated into your pipeline. The ref keyword accepts SHAs or tags as alternatives to branch names.

**include:remote**

This option enables you to pull in YAML files from external URLs beyond your GitLab instance. For instance:

```
include:
  - remote: 'https://gitlab.com/example-project/-/raw/main/.gitlab-ci.yml'
```

This example incorporates the .gitlab-ci.yml file from example-project hosted on GitLab.com into your pipeline.

***

### CI/CD Components and CI/CD Catalog

#### **What Are CI/CD Components?**

[Component for auto-versioning](https://gitlab.com/explore/catalog/guided-explorations/ci-components/ultimate-auto-semversioning)

CI/CD Components are reusable, versioned building blocks for pipelines. Consider them modular templates that are:

* Versioned – ensuring updates don't disrupt your existing pipelines
* Parameterized – allowing you to provide inputs for customized behavior
* Self-contained – designed around a specific function like linting or testing
* Discoverable – available through the GitLab CI/CD Catalog

**How to Include a Component**

To incorporate a component, use the include keyword—but specify a component: reference rather than local, project, or remote.

```
include:
  - component: $CI_SERVER_FQDN/components/markdownlint/markdownlint@2.1.0
```

This example utilizes the markdownlint component from the GitLab CI/CD Catalog. It comes prebuilt and ready for integration into any pipeline. By adding this component, you instantly access Markdown linting functionality without handling its configuration manually. We'll explore the details of including, using, and creating CI/CD Components in subsequent modules.

<details>

<summary><strong>Understanding a Component Reference</strong></summary>

A team lead introduces a developer to CI/CD Components—modular, version-controlled pipeline pieces that anyone in the organization can plug into their workflows.

They share this reference as an example:

```
component: $CI_SERVER_FQDN/components/yamllint/yamllint@1.4.3
```

Together, they walk through what each part represents:

| Part                    | Meaning                                                                    |
| ----------------------- | -------------------------------------------------------------------------- |
| **$CI\_SERVER\_FQDN**   | Refers to the GitLab instance automatically—no need to hardcode the domain |
| **components/yamllint** | The group and project where the reusable component is stored               |
| **yamllint**            | The specific component provided by that project                            |
| **@1.4.3**              | The exact release version that the pipeline should pull in                 |

With a single line of configuration, the developer enables YAML linting across multiple repositories with consistent and centrally managed rules.

***

**Enhancing Pipelines with Multiple Components**

As their pipelines grow, they start combining several components—for code quality checks, container security, Go builds, and more.

```yaml
include:
  - component: $CI_SERVER_FQDN/components/container-scanning/container-scanning@3.0.1
  - component: $CI_SERVER_FQDN/devops-toolkit/go/go-build@1.12.0
```

Each referenced component adds a ready-to-use job or set of jobs, giving the team immediate advantages:

* Automatically adopt established standards
* Leverage expertise from specialists (security, language tooling, etc.)
* Reduce duplication and keep `.gitlab-ci.yml` files clean and maintainable

If you want, I can also create a visual diagram showing how these components plug into a pipeline.

</details>

***

#### Gitlab CI/CD Catalog

{% embed url="<https://gitlab.com/explore/catalog>" %}

In the GitLab CI/CD catalog, you can browse ready-made components for many common tasks, including:

* Executing tests for a variety of languages
* Building and pushing Docker images
* Deploying to different types of environments
* Running security and compliance scans

…and plenty of other workflow needs.

CI/CD Components must be stored in the /templates/ directory at the root of the repository. This standardized location makes components more discoverable and maintainable.

#### **Adjusting a Component’s Behavior by adding Inputs**

A development team recently added a component that runs static analysis to improve code quality. Everything works smoothly—except for one detail:

The component executes in the **test** stage by default, but the team already uses a dedicated **lint** stage:

```yaml
stages:
  - lint
  - test
  - deploy
```

Instead of modifying or duplicating the component’s job, the team looks into whether the component can be configured.

***

**Discovering Available Inputs**

Checking the CI/CD Catalog, they learn that the static-analysis component supports a few adjustable inputs, including:

* **stage**
* **image**

With that information, they update their pipeline:

```yaml
stages:
  - lint

include:
  - component: $CI_SERVER_FQDN/components/code-scanning/static-analysis/pylint@1.0.7
    inputs:
      stage: lint
      image: python:3.9
```

Now the analysis job runs in the correct stage and uses the desired Python version—no rewriting required.

***

**Where to Find Input Documentation**

Every component listed in the CI/CD Catalog includes documentation describing its configurable inputs. You can locate this information in several places:

* **The component’s README**\
  Typically inside the repository where the component is defined.
* **The Catalog entry**\
  The component’s page in the CI/CD Catalog links directly to its README and lists supported inputs.
* **Usage examples**\
  Often included by the component maintainers in either the README or example subdirectories.

These resources help you understand how to customize each component to fit your team’s workflow.

<details>

<summary>Managing Component Versions</summary>

**Why Versioning Matters**

Controlling which version of a CI/CD Component your pipeline uses is essential. Locking the component to a specific release helps ensure consistent behavior—even as maintainers add new features or make changes.

***

**An Unexpected Pipeline Failure**

During a staging deployment, a pipeline that previously worked without issues suddenly starts failing—despite no recent code changes. After some investigation, the team discovers the root cause in their configuration:

```yaml
include:
  - component: $CI_SERVER_FQDN/components/quality-tools/python-linting/pylint@~latest
```

The component was referenced using `@~latest`, which automatically pulls the most recent version. A new release had added a required parameter, and because the pipeline wasn’t specifying it, the job failed.

The realization:\
\&#xNAN;**“We didn’t notice we were tracking an always-updating version—and a breaking change slipped in.”**

***

**Selecting an Appropriate Version Reference**

Different version references serve different purposes. Here’s how they compare:

| Reference Type | Example                       | When to Use It                                                                             |
| -------------- | ----------------------------- | ------------------------------------------------------------------------------------------ |
| **Commit SHA** | `a9f4cbd72318eef...`          | For maximum stability and auditability; ideal when you must guarantee exact behavior       |
| **Tag**        | `@1.3.0`                      | Use stable, released versions that won’t change unexpectedly                               |
| **Branch**     | `main`                        | Follow the latest development work—useful for internal components or testing edge features |
| **@\~latest**  | Always fetches newest release | Suitable only for experimentation or non-critical pipelines where breakage is acceptable   |

By choosing the right version reference, teams can strike a balance between stability, predictability, and agility.

</details>

***

### Gitlab CI/CD Global and Default Keywords

[Full list of supported values for the default keyword](https://docs.gitlab.com/ci/yaml/#default)

A small engineering group suddenly expands—from a handful of developers to a full-sized team. With more people contributing, the CI setups start drifting apart: different runtime versions, mismatched tools, and the usual “but it passes on my laptop” problems.

This is where **GitLab’s global keywords** make a difference. They allow you to define pipeline-wide defaults—ensuring every job starts with the same baseline configuration. Let’s break down ***why*** these keywords are so important and how they help maintain consistency across larger teams:

**Inconsistent Runtime Versions**

*Example:* Across a mid-sized engineering group, developers were unknowingly using a mix of Node.js versions—anything from 12 to 20. Some CI jobs failed purely because people copied old snippets from past repositories.

**Multiple Dependency Install Methods**

*Example:* Different developers preferred different commands:\
`npm ci`, `npm install`, and even `pnpm install`.\
Each approach produced different lockfile behavior and caching results, leading to unpredictable builds.

**Mismatched Database Environments**

*Example:* Local and CI test environments may run various Postgres versions—anywhere between 9.6 and 15—even though the production environment required a specific, newer release.

***

#### **The Fix — The `default` Keyword**

The `default` keyword lets you set shared configuration for **all jobs** in the pipeline unless a job overrides those settings. It’s declared once, at the top of the `.gitlab-ci.yml`, and instantly unifies behavior across the entire pipeline.

For example:

```yaml
default:                     # Applies to every job unless overridden
  image: node:18-alpine      # Common runtime for all builds
  services:                  # Shared database for tests
    - postgres:15
  before_script:             # Automatically runs before each job’s script
    - npm ci
```

With this approach, every job starts with:

* The same base container
* A consistent database version
* A unified dependency installation method

GitLab provides several `default`-level keywords—covering images, services, scripts, timeouts, artifacts, and more. The documentation lists all available options, but the principle is simple: **set the standard once and let every job benefit from it.**

***

### **GitLab CI/CD: Artifacts and Cache**

Even though GitLab pipelines run inside fresh, isolated containers, real-world CI/CD almost always needs to **share files, results, or dependencies** so work isn’t repeated unnecessarily.\
GitLab provides two mechanisms for this: **artifacts** and **cache**.\
They sound similar but serve very different purposes.

Let’s break them down.

***

#### **Artifacts — Deliverables That Move Through the Pipeline**

**Artifacts** [**Docs**](https://docs.gitlab.com/ci/jobs/job_artifacts/)**, Dependencies** [**Docs**](https://docs.gitlab.com/ci/yaml/#dependencies)

Artifacts are files or directories that a job explicitly hands off to later stages.

Think of them as **pipeline outputs**: something a job produces that another job depends on.

**What Artifacts Are Used For**

Use artifacts when you want to *transfer* something forward in the pipeline, such as:

* Compiled frontend bundles
* Built binaries
* Test reports / coverage reports
* Generated documentation
* Scan results

A job in a later **stage** can download and use these artifacts.

**Important Behavior**

* Artifacts are available **only to jobs in later stages**, not parallel jobs in the same stage
* They’re stored by GitLab and can be downloaded via the UI
* They **expire** unless you set `expire_in: 0` (never expire) or customize the duration

**Quick Example**

```yaml
build:
  stage: build
  script:
    - npm run build
  artifacts:
    paths:
      - dist/

deploy:
  stage: deploy
  dependencies:
    - build
  script:
    - cp -r dist/ /var/www/app
```

The `dist/` folder created by the build job is passed to the deploy job.

Another example:

```yaml
build:
  artifacts:
    paths:
      - dist/
    expire_in: 1 week

deploy:
  dependencies: [build]  # Optional: only get what you need
```

***

#### **Cache — Speed Boosters for Repeated Work**

[**Docs**](https://docs.gitlab.com/ci/caching/#good-caching-practices)**,** [**Caching visualized**](https://about.gitlab.com/blog/a-visual-guide-to-gitlab-ci-caching/)**,** [**Official caching examples**](https://docs.gitlab.com/ci/caching/examples/)

Cache is designed to **speed up jobs**, not transfer deliverables.

While artifacts are about *sharing*, cache is about *avoiding re-downloading or recomputing*.

**What Cache Is Used For**

Cache shines with data that:

* Takes a long time to download
* Doesn’t need to be versioned
* Can be reused across jobs and pipelines

Examples:

* `node_modules/`
* `.m2/` Maven repository
* Python `venv/`
* Docker build layers
* Large dependency folders

**Important Behavior**

* Cache is typically **shared across jobs and pipelines within the same project**
* Not intended for build outputs
* Cache keys control when caches are reused or invalidated
* A job restores the cache **before** running, and uploads it **afterwards** (if changed)
* **Not guaranteed**: Caches can be cleared or evicted, so your pipeline should still work without them (just slower)
* **Upload timing**: Cache is uploaded after `script` succeeds, so failed jobs don't update the cache
* **Pull/Push policies**: You can control whether a job downloads, uploads, or both:
  * `pull`: Only download (for jobs that don't modify dependencies). That means the cache can only be read/downloaded.
  * `push`: Only upload (for the first job that installs dependencies). That means the cache can be updated/uploaded.
  * `pull-push`: Both (default). A job can download the cache at the start and upload the cache at the end, both read and write permissions. That means the cache can be updated and read/uploaded.

**Quick Example for per-job cache**

```yaml
# Cache pip packages
test:
  cache:
    paths:
      - .pip/
      - venv/
  script:
    - pip install -r requirements.txt

# Save test reports
  artifacts:
    reports:
      coverage: htmlcov/index.html
```

GitLab restores cached dependencies so tests start faster.&#x20;

**Cache** (`.pip/` and `venv/`) in the above example:

* These directories are cached **between pipeline runs**
* Next time this job runs, pip packages won't need to be re-downloaded from the internet
* The virtual environment is preserved
* **No cache key specified** = uses the default key (all jobs share this cache)
* Makes subsequent pipeline runs faster

<details>

<summary><strong>Artifacts</strong> (<code>htmlcov/index.html</code>) in the above example:</summary>

* This is saved **within the current pipeline run**
* The coverage report is preserved and displayed in GitLab's UI under the test report section
* Available for download after the pipeline finishes
* **Not** carried over to future pipeline runs

</details>

Another example comes with a predefined variable.  `CI_COMMIT_REF_SLUG` is a **GitLab predefined variable** that contains a sanitized version of your branch or tag name, safe for use in URLs and file paths.

```yaml
install:
  cache:
    key: deps-${CI_COMMIT_REF_SLUG}
    paths:
      - node_modules/
    policy: push
  script:
    - npm install

test:
  cache:
    key: deps-${CI_COMMIT_REF_SLUG}
    paths:
      - node_modules/
    policy: pull
  script:
    - npm test
```

Another example with strategic caching:

```yaml
default:
  cache:
    paths:
      - node_modules/
```

When you set cache under `default:`, you’re saying:

> “All jobs in this pipeline should use this cache configuration unless they choose to override it.”

**What this last example means in practice**

* Every job will **restore** this cache before execution
* Every job will **update** this cache after completion
* If a job doesn’t need Node.js, it will still unnecessarily cache `node_modules/`
* Jobs may accidentally overwrite each other’s cache

⚠️ **This can cause cache pollution**, because a job that shouldn’t be touching cache might still rewrite it.

<details>

<summary>Cache sharing with global or branch scopes</summary>

#### **Sharing Cache Across Jobs**

***Using the Same Cache Key***

To share cache across jobs, **use the same cache key** in all jobs that need to access it:

```yaml
install:
  cache:
    key: shared-dependencies  # Same key = shared cache
    paths:
      - node_modules/
  script:
    - npm install

test:
  cache:
    key: shared-dependencies  # Same key!
    paths:
      - node_modules/
  script:
    - npm test

build:
  cache:
    key: shared-dependencies  # Same key!
    paths:
      - node_modules/
  script:
    - npm run build
```

All three jobs now share the same cache.

***

#### Sharing Across ALL Branches

Use a **static key** (no branch variables):

```yaml
# This cache is shared by ALL branches
test:
  cache:
    key: global-python-deps
    paths:
      - .pip/
      - venv/
```

Without `${CI_COMMIT_REF_SLUG}` or other dynamic variables, every branch and every job uses the same cache.

***

#### Global Cache Configuration

Define cache globally so all jobs inherit it:

```yaml
# Applied to ALL jobs automatically. 
# You can also use default: cache for similar result
# Although it's not recommended for most cases
cache:
  key: ${CI_COMMIT_REF_SLUG}  # Or use a static key. 
  # This example is for the current branch (or tag) name
  # because of the predefined variable which gives branch scope for the cache
  paths:
    - node_modules/

job1:
  script:
    - npm test  # Automatically uses the global cache

job2:
  script:
    - npm run lint  # Also uses the global cache

job3:
  cache: {}  # Disable cache for this specific job
  script:
    - something else
```

***

**Is Cache Automatically Applied?**

**Yes, but with conditions:**

1. **Same key required**: The job must specify the same cache key (or inherit it globally)
2. **Automatic download**: GitLab automatically downloads and extracts the cache at the start of each job that requests it
3. **Automatic upload**: Cache is automatically uploaded after the job's `script` section succeeds
4. **No explicit "restore" needed**: You don't need to manually extract or apply it

**Example flow:**

```yaml
# Job 1 runs first
setup:
  cache:
    key: my-cache
    paths:
      - dependencies/
  script:
    - download_dependencies  # Creates dependencies/
    # Cache automatically uploaded after success

# Job 2 runs (parallel or after)
test:
  cache:
    key: my-cache  # Same key!
    paths:
      - dependencies/
  script:
    # Cache automatically downloaded BEFORE this runs
    - use_dependencies  # dependencies/ already exists
```

***Important notes:***

**Cache is not guaranteed**: If the cache is cleared, evicted, or doesn't exist yet, the job still runs (just slower). Always ensure your jobs can work without cache.

**First run has no cache**: The first time a pipeline runs with a new key, there's no cache to download. Subsequent runs will have it.

**Policy control**: You can optimize by having only one job push to cache:

```yaml
# Only this job updates the cache
install:
  cache:
    key: deps
    paths:
      - node_modules/
    policy: push  # Only upload
  script:
    - npm install

# These jobs only read from cache
test:
  cache:
    key: deps
    paths:
      - node_modules/
    policy: pull  # Only download
  script:
    - npm test

lint:
  cache:
    key: deps
    paths:
      - node_modules/
    policy: pull  # Only download
  script:
    - npm run lint
```

This prevents multiple jobs from trying to update the same cache simultaneously, which could cause conflicts.

</details>

***

#### **Artifacts vs Cache — The Core Difference**

| Feature          | **Artifacts**                 | **Cache**                        |
| ---------------- | ----------------------------- | -------------------------------- |
| Primary purpose  | Pass files to later stages    | Speed up jobs by reusing data    |
| Scope            | Within **same pipeline**      | Across pipelines                 |
| Typical contents | Build outputs, reports        | Dependencies, package caches     |
| Persistence      | Saved by GitLab and viewable  | Can be overwritten frequently    |
| Availability     | Only to future stages         | Any job that uses same cache key |
| Expiration       | Configurable, default expires | Not stored forever               |

***

#### **How to Decide Which One You Need**

Use **artifacts** when:\
✔ Another stage needs the exact output of a job\
✔ You want downloadable files in GitLab\
✔ You’re producing build or test results

Use **cache** when:\
✔ You want to avoid reinstalling dependencies\
✔ You’re optimizing frequent repetitive work\
✔ The data can be safely regenerated

***

**In One Sentence**

* **Artifacts = hand-off packages between stages.**
* **Cache = reusable stash for speeding up work.**

***

### 🚀 Pipelines in GitLab CI/CD

[Variables docs](https://docs.gitlab.com/ci/variables/), [variable precedence](https://docs.gitlab.com/ci/variables/#cicd-variable-precedence)

Pipelines are the backbone of GitLab’s automation system. They define *what happens* after you push code—building, testing, scanning, deploying, and everything in between. But as your project grows, so does pipeline complexity. Small configuration changes can snowball into hours of maintenance unless you design pipelines in a scalable, DRY, and predictable way.

One of the key tools GitLab gives you for that is **variables**.

***

#### 🔧 Why Pipelines Need Variables

Imagine a team maintaining several GitLab pipeline files across multiple services. One morning, infrastructure updates the internal API endpoint:

> **“New internal API URL: services.internal.example.net”**

That *should* have been a simple update…\
but instead, the team spends half a day searching for the old URL across multiple YAML files scattered across microservice repositories.

Later that day, the integration pipeline fails — because one job in one file still references the old URL.

**The problem?**

Hard-coded values buried deep in different pipeline definitions.

***

**💡 Variables Fix This**

With GitLab CI/CD variables, you replace repeated values with a single source of truth.

**Before (hard-coded everywhere):**

```yaml
build:
  script:
    - curl https://api.old-endpoint.local/build-info

deploy:
  image: registry.old-endpoint.local/deploy:latest
```

**After (one change updates everything):**

```yaml
variables:
  INTERNAL_API: https://services.internal.example.net

build:
  script:
    - curl "$INTERNAL_API/build-info"

deploy:
  image: "$INTERNAL_API/deploy:latest"
```

Change the variable once → every job automatically uses the updated value.

***

#### 🧩 Types of CI/CD Variables

GitLab provides several kinds of variables, each with different use cases and scopes.

***

**1. Predefined Variables**

GitLab injects these automatically into every pipeline.

Examples:

* **CI\_COMMIT\_SHA** — the commit’s full SHA
* **CI\_COMMIT\_REF\_NAME** — the branch or tag name
* **CI\_PIPELINE\_SOURCE** — whether it was triggered by push, merge request, schedule, etc.

These are ideal for tagging images, tracking builds, and making pipelines dynamic.

***

**2. Custom Variables**

You define your own values—great for anything that:

* changes between environments
* appears multiple times
* should be controlled from a single location
* contains secrets (when masked/protected)

Examples:

* URLs and API endpoints
* Docker registry addresses
* Feature flags
* Version strings (e.g., `NODE_VERSION`, `TERRAFORM_VERSION`)

**Where can you define them?**

* In the `.gitlab-ci.yml`
* In GitLab’s UI (project/group/instance level)
* At runtime (manual pipeline triggers)
* In child pipelines
* In components or includes

***

**🧠 Variable Precedence (Why It Matters)**

GitLab allows the *same variable name* to appear in multiple places.\
But **which one wins**?

For example, if `API_URL` is defined:

* in the GitLab UI
* in the `.gitlab-ci.yml`
* inside a job
* in a child pipeline
* as a secret variable

GitLab has strict precedence rules to determine the final value.\
Higher-priority variables override lower ones.

Understanding precedence prevents extremely tricky bugs—like pipelines working in one branch but breaking in another because a variable value was overridden unintentionally.

(You can always check GitLab’s full precedence documentation when designing critical pipelines.)

#### Predefined variables

<details>

<summary>Common predefined variables</summary>

**1. CI\_COMMIT\_SHA**

* **What it is:** The full commit hash of the current pipeline’s commit
* **Use case:** Tagging builds or container images
* **Example:**

```yaml
build_image:
  script:
    - docker build -t my-app:$CI_COMMIT_SHA .
```

***

**2. CI\_COMMIT\_SHORT\_SHA**

* **What it is:** Shortened version of the commit SHA (usually first 8 characters)
* **Use case:** Labeling artifacts or build folders for easier reference
* **Example:**

```yaml
build_app:
  script:
    - mv dist/ build-$CI_COMMIT_SHORT_SHA/
```

***

**3. CI\_COMMIT\_REF\_NAME**

* **What it is:** Name of the branch or tag for the current commit
* **Use case:** Conditional deployments or environment routing
* **Example:**

```yaml
deploy:
  script: deploy-to-env.sh
  rules:
    - if: '$CI_COMMIT_REF_NAME == "main"'
```

***

**4. CI\_COMMIT\_MESSAGE**

* **What it is:** The commit message that triggered the pipeline
* **Use case:** Include context in logs, notifications, or deployment messages
* **Example:**

```yaml
notify:
  script:
    - echo "Deploy triggered by commit: $CI_COMMIT_MESSAGE"
```

***

**5. CI\_PIPELINE\_SOURCE**

* **What it is:** How the pipeline was triggered (`push`, `schedule`, `merge_request`, `manual`, etc.)
* **Use case:** Run certain jobs only on scheduled or manual pipelines
* **Example:**

```yaml
performance_tests:
  script: run-performance-suite.sh
  rules:
    - if: '$CI_PIPELINE_SOURCE == "schedule"'
```

***

**6. CI\_DEFAULT\_BRANCH**

* **What it is:** The project’s default branch, usually `main` or `master`
* **Use case:** Conditional logic for jobs that should run only on the default branch
* **Example:**

```yaml
deploy_production:
  script: deploy-prod.sh
  rules:
    - if: '$CI_COMMIT_REF_NAME == "$CI_DEFAULT_BRANCH"'
```

***

**7. CI\_JOB\_NAME**

* **What it is:** The name of the job currently running
* **Use case:** Customize behavior, logging, or artifact naming per job
* **Example:**

```yaml
log_job:
  script:
    - echo "Running job: $CI_JOB_NAME"
```

***

**Practical Patterns**

* **Scheduled jobs:** `$CI_PIPELINE_SOURCE == "schedule"` → run overnight tasks
* **Tagged deployments:** `$CI_COMMIT_TAG` → deploy only tagged releases
* **Unique builds:** `$CI_COMMIT_SHORT_SHA` → name builds and artifacts uniquely

These predefined variables give pipelines **dynamic behavior** without hard-coding values, making your CI/CD more maintainable and safe.

</details>

#### Custom variables **in GitLab CI/CD**

Custom variables let you manage **project-specific or environment-specific configuration**.\
Unlike predefined variables, they are created and maintained by your team. They are ideal for:

* API endpoints
* Database credentials
* Feature flags
* Version numbers
* Deployment tokens

Custom variables help **avoid hard-coding values** in `.gitlab-ci.yml` and make pipelines more maintainable.

***

**Where Custom Variables Are Defined**

| Scope                                    | Example Use                                                                                                  |
| ---------------------------------------- | ------------------------------------------------------------------------------------------------------------ |
| **Pipeline-level / in `.gitlab-ci.yml`** | Non-sensitive values like build flags or version numbers visible to everyone who can view the repo           |
| **Project-level / GitLab UI**            | Sensitive variables like deployment tokens or API keys. Only authorized users can see or modify them         |
| **Group-level / GitLab UI**              | Shared values across multiple projects, e.g., company-wide Docker registry URLs or common deployment targets |

***

**Example Usage**

**1. Pipeline-level variable**

```yaml
variables:
  BUILD_FLAGS: "--production"

build:
  script:
    - npm run build $BUILD_FLAGS
```

* **Visible in code**
* Used for build options, feature flags, or versions

***

**2. Project-level variable**

```yaml
deploy:
  script:
    - deploy-app --token $DEPLOY_TOKEN
```

* **Stored securely in Project Settings**
* **Only authorized users** can view/modify
* Ideal for secrets like deployment tokens or API keys

***

**3. Group-level variable**

```yaml
publish:
  script:
    - docker push $DOCKER_REGISTRY/myapp:latest
```

* Shared across all projects in the group
* Great for common registry URLs, environment URLs, or company-wide configurations

***

**Custom Variable Best Practices**

1. **Avoid hard-coding**
   * Hard-coded values must be updated in multiple jobs and files when things change
   * Using custom variables centralizes updates
2. **Environment scope**
   * Restrict a variable to a specific environment (e.g., production)
   * Prevents accidental use in other environments
3. **Protected variables**
   * Only available on **protected branches** (e.g., main/master)
   * Prevents accidental exposure in feature branches
4. **Masked variables**
   * Hides the value in job logs
   * Job scripts can still use the variable safely
   * Prevents secrets from being accidentally printed
5. **Masked + hidden**
   * In addition to masking, the variable value is not visible in the UI
   * Can only be set **when creating a new variable**, not after creation

**Real-world Scenario: API Migration**

Imagine a team’s API provider changes all endpoints to a new domain.

Approaches:

1. **Hard-coded URLs**
   * Must update every job manually (build, test, deploy-dev, deploy-prod, integration tests)
   * Error-prone, tedious
2. **Custom variables**
   * Update the variable once → all jobs automatically use the new endpoint
   * Reduces maintenance, lowers risk of deployment failures
3. **Environment-specific if statements**
   * Can control which endpoint to use per environment
   * More flexible than hard-coding, but less centralized than variables

**Takeaway:** Custom variables make pipelines **more maintainable, safer, and easier to update**.

***

### Rules

<details>

<summary>Quick reference for Rules</summary>

<figure><img src="https://2332658533-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FG5fhKjYnbaQlTPTcaO85%2Fuploads%2FwwqWgg79ZroyxQ6NUA8d%2FRules%20Syntax%20Quick%20Reference.jpg?alt=media&#x26;token=892cb77c-58a8-43e7-96cc-e15c787d3f0d" alt=""><figcaption></figcaption></figure>

</details>

Rules control **when jobs run** based on conditions you define. They're the primary way to make your pipelines dynamic and efficient.

***

#### Basic Structure

```yaml
job_name:
  rules:
    - if: <condition>
      when: <action>
  script:
    - echo "Do something"
```

**The logic:**

* Define a `rules:` block
* Add one or more conditions (`if`, `changes`, `exists`)
* Specify what happens when the condition is true (`when`)

***

**The `if` Clause**

Use `if` to check variables (including GitLab's predefined variables):

```yaml
web-only-job:
  rules:
    - if: $CI_PIPELINE_SOURCE == "web"
  script:
    - echo "This job runs only when triggered from the web UI"
```

**Common conditions:**

```yaml
# Run only on main branch
deploy:
  rules:
    - if: $CI_COMMIT_BRANCH == "main"
  script:
    - deploy_to_production

# Run only for merge requests
test-mr:
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
  script:
    - npm test

# Run only when manually triggered
manual-job:
  rules:
    - if: $CI_PIPELINE_SOURCE == "web"
  script:
    - special_operation
```

***

**The `changes` Clause**

Run jobs only when specific files change - **great for performance!**

```yaml
# Only run tests when Python files change
python-tests:
  rules:
    - changes:
        - "**/*.py"
        - requirements.txt
  script:
    - pytest

# Only rebuild docs when markdown changes
build-docs:
  rules:
    - changes:
        - "docs/**/*.md"
  script:
    - build_documentation
```

This prevents unnecessary work. If you only changed a README, why rebuild the entire application?

***

**The `exists` Clause**

Run jobs only if certain files exist:

```yaml
docker-build:
  rules:
    - exists:
        - Dockerfile
  script:
    - docker build -t myapp .
```

***

**The `when` Keyword**

Controls **what happens** when a rule matches:

```yaml
- if: <condition>
  when: <action>
```

**Options:**

* **`on_success`** (default) - Run the job if all previous stages succeeded
* **`always`** - Run the job regardless of previous job status
* **`never`** - Don't run the job
* **`manual`** - Job requires manual approval in the UI
* **`delayed`** - Wait before running (with `start_in`)

**Examples**

```yaml
# Manual deployment to production
deploy-prod:
  rules:
    - if: $CI_COMMIT_BRANCH == "main"
      when: manual  # Requires someone to click "Play" button
  script:
    - deploy_to_production

# Always run cleanup, even if tests fail
cleanup:
  rules:
    - when: always
  script:
    - rm -rf temp_files/

# Never run on feature branches
expensive-job:
  rules:
    - if: $CI_COMMIT_BRANCH =~ /^feature\//
      when: never
    - when: on_success  # Run on all other branches
  script:
    - expensive_operation
```

***

**Multiple Rules (Evaluation Order)**

Rules are evaluated **top to bottom**. The **first match wins**:

```yaml
deploy:
  rules:
    - if: $CI_COMMIT_BRANCH == "main"
      when: always
    - if: $CI_COMMIT_BRANCH == "staging"
      when: manual
    - when: never  # Don't run on any other branch
  script:
    - deploy
```

**Logic:**

1. If branch is `main` → always deploy
2. Else if branch is `staging` → require manual approval
3. Otherwise → never run

***

**Combining Conditions**

```yaml
# Run when BOTH conditions are true
build-and-deploy:
  rules:
    - if: $CI_COMMIT_BRANCH == "main" && $CI_PIPELINE_SOURCE == "push"
  script:
    - build_and_deploy

# Run when code changes AND on main branch
test-backend:
  rules:
    - if: $CI_COMMIT_BRANCH == "main"
      changes:
        - "backend/**/*.js"
  script:
    - npm test
```

***

**Practical Real-World Example**

```yaml
# Unit tests - always run
unit-tests:
  rules:
    - when: on_success
  script:
    - npm test

# Integration tests - only when backend changes
integration-tests:
  rules:
    - changes:
        - "src/**/*.js"
        - "tests/integration/**"
  script:
    - npm run test:integration

# Deploy to staging - only on develop branch
deploy-staging:
  rules:
    - if: $CI_COMMIT_BRANCH == "develop"
  script:
    - deploy_to_staging

# Deploy to production - only on main, requires manual approval
deploy-production:
  rules:
    - if: $CI_COMMIT_BRANCH == "main"
      when: manual
  script:
    - deploy_to_production

# Cleanup - always run, even if previous jobs fail
cleanup:
  rules:
    - when: always
  script:
    - cleanup_resources
```

***

#### **Why Use Rules?**

✅ **Save time** - Don't run unnecessary jobs

✅ **Save resources** - Fewer compute minutes = lower costs

✅ **Faster feedback** - Developers get results quicker

✅ **Control deployment** - Prevent accidental production deployments

✅ **Improve efficiency** - Only test what changed

<details>

<summary>More examples of using rules</summary>

These examples add a few important concepts:

***

**1. Deploy Only from Main - The "Catch-All" Pattern**

```yaml
deploy: 
  rules:
    - if: '$CI_COMMIT_REF_NAME == "main"'
      when: always
    - when: never  # Catch-all: if nothing matched above, never run
```

**Key insight:** The final `when: never` acts as a **default/fallback**. If no previous rule matches, the job won't run. This is a common pattern to explicitly block a job unless conditions are met.

***

**2. Merge Request Trigger**

```yaml
job
  rules:
    - if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
```

**Key insight:** The `CI_PIPELINE_SOURCE` variable identifies **how the pipeline was triggered**:

* `"merge_request_event"` - From a merge request
* `"web"` - From the GitLab web UI
* `"push"` - From a git push
* `"schedule"` - From a scheduled pipeline
* `"api"` - From API call

This is crucial for running jobs only in specific contexts (e.g., run extra validation only on MRs).

***

**3. Workflow Rules - Pipeline-Level Control**

```yaml
workflow:
  rules: 
    - if: $CI_COMMIT_MESSAGE =~ /-wip$/
      when: never
    - if: $CI_COMMIT_TAG 
      when: never
    - when: always
```

**Major new concept:** `workflow:` applies rules to the **entire pipeline**, not just individual jobs.

**What this does:**

* If commit message ends with `-wip` → **entire pipeline is blocked**
* If triggered by a tag → **entire pipeline is blocked**
* Otherwise → pipeline runs

**Use cases:**

* Skip pipelines for work-in-progress commits
* Prevent pipelines on tag creation
* Only run pipelines on specific branches globally

</details>

***

### Optimizing Job Run Order

[Controlling how jobs run](https://docs.gitlab.com/ci/jobs/job_control/#parallelize-large-jobs), [Pipeline efficiency](https://docs.gitlab.com/ee/ci/pipelines/pipeline_efficiency.html)

There are two main strategies for optimizing pipeline performance: **changing job execution order** and **parallelizing slow jobs**.

***

#### 1. The `needs` Keyword - Optimize Job Order

By default, GitLab runs jobs in **stages sequentially** - all jobs in one stage must complete before the next stage starts:

```yaml
stages:
  - build
  - test
  - deploy

build:
  stage: build
  script:
    - npm run build

test:
  stage: test
  script:
    - npm test  # Waits for ALL build stage jobs to finish

deploy:
  stage: deploy
  script:
    - deploy  # Waits for ALL test stage jobs to finish
```

**Problem:** Even if `test` only needs `build` to finish, it must wait for *every* job in the build stage.

**Solution: Use `needs`**

```yaml
build:
  stage: build
  script:
    - npm run build

test:
  stage: test
  needs: [build]  # Only wait for 'build' job
  script:
    - npm test

deploy:
  stage: deploy
  needs: [test]  # Only wait for 'test' job
  script:
    - deploy
```

**Result:** Jobs start as soon as their specific dependencies finish, not when the entire stage completes.

**More Complex Example**

```yaml
build-backend:
  stage: build
  script:
    - build backend

build-frontend:
  stage: build
  script:
    - build frontend

test-backend:
  stage: test
  needs: [build-backend]  # Starts immediately after build-backend
  script:
    - test backend

test-frontend:
  stage: test
  needs: [build-frontend]  # Starts immediately after build-frontend
  script:
    - test frontend

deploy:
  stage: deploy
  needs: [test-backend, test-frontend]  # Waits for both tests
  script:
    - deploy
```

**Without `needs`:** `test-backend` waits for both backend AND frontend to build.

**With `needs`:** `test-backend` starts as soon as `build-backend` finishes, even if `build-frontend` is still running.

***

#### 2. The `parallel` Keyword - Speed Up Individual Jobs

When a single job is slow because it has too much work, split it across multiple runners:

```yaml
backend_tests:
  parallel: 4
  script:
    - npm test --shard=$CI_NODE_INDEX/$CI_NODE_TOTAL
```

#### How It Works

**GitLab creates 4 identical runners** and provides special variables:

* `$CI_NODE_TOTAL` = `4` (total number of parallel runners)
* `$CI_NODE_INDEX` = `1`, `2`, `3`, or `4` (which runner this is)

**Your test framework uses these to split the work:**

```yaml
# Runner 1 runs: npm test --shard=1/4
# Runner 2 runs: npm test --shard=2/4
# Runner 3 runs: npm test --shard=3/4
# Runner 4 runs: npm test --shard=4/4
```

Each runner executes **1/4 of the tests** simultaneously.

#### **Real-World Scenario**

**Before (100 tests on 1 runner):**

* Time: 400 seconds
* CPU: maxed out

**After (100 tests on 4 runners):**

* Time: \~100 seconds (4x faster)
* CPU per runner: \~25% each
* Total job time: significantly reduced

***

#### Parallel Examples for Different Tools

**Jest (JavaScript)**

```yaml
test:
  parallel: 4
  script:
    - npm test -- --shard=$CI_NODE_INDEX/$CI_NODE_TOTAL
```

**pytest (Python)**

```yaml
test:
  parallel: 3
  script:
    - pytest --splits $CI_NODE_TOTAL --group $CI_NODE_INDEX
```

**RSpec (Ruby)**

```yaml
test:
  parallel: 5
  script:
    - bundle exec rspec --only-failures --ci-node-total $CI_NODE_TOTAL --ci-node-index $CI_NODE_INDEX
```

**Manual splitting**

```yaml
test:
  parallel: 2
  script:
    - |
      if [ "$CI_NODE_INDEX" == "1" ]; then
        npm run test:unit
      else
        npm run test:integration
      fi
```

***

#### Combining `needs` and `parallel`

You can use both strategies together for maximum optimization:

```yaml
build:
  script:
    - npm run build

# Fast parallel testing
unit-tests:
  parallel: 4
  needs: [build]
  script:
    - npm test --shard=$CI_NODE_INDEX/$CI_NODE_TOTAL

# Integration tests can start immediately too
integration-tests:
  parallel: 2
  needs: [build]
  script:
    - npm run test:integration --shard=$CI_NODE_INDEX/$CI_NODE_TOTAL

# Deploy as soon as all tests pass
deploy:
  needs: [unit-tests, integration-tests]
  script:
    - deploy
```

**Benefits:**

* Tests start immediately after build (don't wait for each other)
* Each test suite runs in parallel (faster execution)
* Deploy starts as soon as last test finishes

***

#### Parallel Matrix Strategy

You can also use `parallel:matrix` to run the same job with different configurations:

```yaml
test:
  parallel:
    matrix:
      - RUBY_VERSION: ["2.7", "3.0", "3.1"]
        DATABASE: ["postgres", "mysql"]
  script:
    - test-with-ruby-$RUBY_VERSION and $DATABASE
```

This creates **6 jobs** (3 Ruby versions × 2 databases) that run simultaneously.

***

**Key Takeaways**

**`needs` keyword:**

* ✅ Optimizes **job order**
* ✅ Jobs start as soon as dependencies finish
* ✅ Reduces total pipeline time

**`parallel` keyword:**

* ✅ Optimizes **individual job runtime**
* ✅ Distributes work across multiple runners
* ✅ Reduces CPU bottlenecks
* ✅ Requires test framework support for sharding

**Best practice:** Use both together for maximum speed!

***

### Managing Complexity in Gitlab CI/CD

As projects grow, CI/CD configurations can become unwieldy. GitLab provides several pipeline types to manage this complexity.

***

#### The Problem

Large projects face:

* **Huge config files** (hundreds of lines in `.gitlab-ci.yml`)
* **Distributed teams** wanting control over their own configuration
* **Unnecessary pipeline runs** for commits that don't need CI

***

#### Solution 1: Parent-Child Pipelines

Split large configurations into smaller, manageable files within the same repository.

**Example**

```yaml
# Main .gitlab-ci.yml at repository root
frontend:
  trigger:
    include: frontend/.gitlab-ci.yml
  rules:
    - changes:
        - frontend/**

backend:
  trigger:
    include: backend/.gitlab-ci.yml
  rules:
    - changes:
        - backend/**
```

```yaml
# frontend/.gitlab-ci.yml
stages:
  - test
  - build

frontend-test:
  stage: test
  script:
    - npm test

frontend-build:
  stage: build
  script:
    - npm run build
```

**How It Works**

1. **Parent pipeline** (main `.gitlab-ci.yml`) detects changes
2. Only triggers **child pipeline** for the affected area
3. Child pipeline runs independently with its own configuration

**Benefits**

✅ **Modularity** - Each team manages their own `.gitlab-ci.yml`

✅ **Performance** - Only relevant pipelines run (frontend changes don't trigger backend tests)

✅ **Reduced complexity** - Smaller, focused configuration files

✅ **Parallel execution** - Child pipelines run concurrently

✅ **Easier to understand** - Each file contains only relevant jobs

***

#### Solution 2: Multi-Project Pipelines

Trigger pipelines in **different repositories** - useful for microservices or split codebases.

**Example**

```yaml
# In main-app repository
trigger_payment_tests:
  trigger:
    project: payments-team/payment-service
    branch: main
  rules:
    - changes:
        - api/payment/**
```

**Real-World Scenario**

Your e-commerce site has:

* Main application in `ecommerce/main-app`
* Payment service in `payments-team/payment-service`
* Shipping service in `logistics/shipping-service`

When you change payment-related code in the main app, it automatically triggers tests in the payment service repository to ensure compatibility.

**Benefits**

✅ **Cross-repo coordination** - Test dependencies across repositories

✅ **Microservices architecture** - Each service has its own repo and CI

✅ **Team independence** - Payment team controls their pipeline

✅ **Integration testing** - Verify services work together

***

#### Solution 3: Merge Request Pipelines

Run different jobs for merge requests vs. regular branch pushes.

**Example**

```yaml
# Skip expensive jobs in merge requests
deploy_staging:
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
      when: never  # Don't deploy to staging from MRs
    - when: on_success  # Run on regular pushes
  script:
    - deploy_to_staging

# Run extra validation only on MRs
code_quality:
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
  script:
    - run_linters
    - check_code_quality
```

#### Common Patterns

**Skip deployments in MRs:**

```yaml
deploy:
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
      when: never
  script:
    - deploy
```

**Run extra checks only on MRs:**

```yaml
security_scan:
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
  script:
    - security_audit
```

**Different behavior for MRs vs. main:**

```yaml
test:
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
      variables:
        TEST_LEVEL: "quick"
    - if: $CI_COMMIT_BRANCH == "main"
      variables:
        TEST_LEVEL: "full"
  script:
    - run_tests --level=$TEST_LEVEL
```

**Benefits**

✅ **Faster feedback** - Developers get quick results on MRs

✅ **Save resources** - Don't deploy to staging for every MR

✅ **Targeted testing** - Run different tests in different contexts

✅ **Cost optimization** - Skip expensive jobs when not needed

***

**Combining All Three**

Real-world complex setup:

```yaml
# Main .gitlab-ci.yml in monorepo

# Parent-child: Frontend team owns their pipeline
frontend-pipeline:
  trigger:
    include: frontend/.gitlab-ci.yml
  rules:
    - if: $CI_PIPELINE_SOURCE != "merge_request_event"
      changes:
        - frontend/**

# Parent-child: Backend team owns their pipeline
backend-pipeline:
  trigger:
    include: backend/.gitlab-ci.yml
  rules:
    - changes:
        - backend/**

# Multi-project: Trigger external payment service tests
payment-integration:
  trigger:
    project: payments-team/payment-service
  rules:
    - changes:
        - api/payments/**

# Merge request specific: Only run on MRs
mr-checks:
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
  script:
    - run_quick_checks

# Production deploy: Skip on MRs, only on main
deploy-prod:
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
      when: never
    - if: $CI_COMMIT_BRANCH == "main"
      when: manual
  script:
    - deploy_production
```

***

#### Pipeline Type Comparison

| Type              | Use Case                               | Scope                            |
| ----------------- | -------------------------------------- | -------------------------------- |
| **Basic**         | Simple projects                        | Single file, sequential stages   |
| **With `needs`**  | Optimize dependencies                  | Single file, parallel execution  |
| **Parent-Child**  | Large monorepos, team separation       | Same repo, multiple config files |
| **Multi-Project** | Microservices, cross-repo dependencies | Different repositories           |
| **Merge Request** | Different behavior for MRs vs branches | Context-aware execution          |

***

**Key Takeaways**

**Parent-Child Pipelines:**

* Break up large configs
* Team ownership of their pipeline
* Only run what changed

**Multi-Project Pipelines:**

* Coordinate across repositories
* Test microservice integrations
* Maintain service independence

**Merge Request Pipelines:**

* Faster developer feedback
* Skip unnecessary jobs
* Context-specific testing

**Best Practice:** Use the simplest pipeline type that solves your problem. Start simple, add complexity only when needed.

***

### Gitlab Registires

[Course](https://university.gitlab.com/courses/introduction-to-gitlab-registries) to learn about Package, Container, Terraform Registries

***

### Docker in Docker

{% embed url="<https://docs.gitlab.com/ci/docker/using_docker_build/>" %}

{% embed url="<https://docs.gitlab.com/ci/docker/using_docker_images/>" %}

***