Console Login

Stop Blaming Jenkins: Why Your CI/CD Pipeline Crawls and How to Fix It

Stop Blaming Jenkins: Why Your CI/CD Pipeline Crawls and How to Fix It

I watched a senior developer stare at a spinning circle for 45 minutes last Tuesday. He wasn't compiling the Linux kernel. He was waiting for a simple React frontend build to pass through a shared runner queue. This is the silent killer of engineering velocity. We obsess over writing clean code, but then we throw it into a CI/CD pipeline running on oversubscribed hardware that hasn't been upgraded since 2019.

If you are deploying critical infrastructure in Norway or the broader Nordics, relying on default shared runners from major cloud providers is a rookie mistake. The latency kills you. The "noisy neighbors" steal your CPU cycles. And frankly, the disk I/O on standard cloud instances is pathetic. I’ve fixed enough broken pipelines to know the culprit is rarely the software—it’s the metal underneath.

The Bottleneck You Can't See: Disk I/O Wait

CI/CD is basically a disk torture test. `npm install`, `docker build`, `git clone`—these are all heavy I/O operations. When you run these on a cheap VPS or a shared runner, you are fighting for IOPS with everyone else on that hypervisor. If your disk latency spikes, your build time doubles. It is that simple.

I logged into a client's build server recently that was "mysteriously slow." One command revealed the truth.

iostat -x 1 10

The %iowait was sitting at 35%. The CPU was idle, waiting for the disk to wake up. This is why we insist on NVMe storage for all build infrastructure. At CoolVDS, we don't use spinning rust or SATA SSDs for a reason. You need the high throughput of NVMe to handle the thousands of tiny file writes that node modules or cargo crates generate.

Pro Tip: If your build server's `await` time in `iostat` exceeds 5ms consistently, your storage solution is trash. Move to an NVMe-backed instance immediately.

Latency Matters: The Norwegian Context

Let's talk about network topology. If your team is in Oslo and your Git repository is hosted in US-East, you are adding unnecessary milliseconds to every `git fetch` and artifact upload. For Norwegian dev teams, data sovereignty and latency are linked. Hosting your runners locally, specifically on infrastructure peered directly at NIX (Norwegian Internet Exchange), drastically reduces the time it takes to push large Docker images to your registry.

Furthermore, GDPR and Schrems II compliance isn't just a legal headache; it's an architectural constraint. Running builds on US-owned shared runners can expose temporary secrets or PII to jurisdictions you want to avoid. Self-hosting runners on a Norwegian VPS gives you full control over the data lifecycle.

The Architecture: Self-Hosted Runners with BuildKit

Stop using the default `docker-in-docker` (dind) setup without optimization. It's slow and insecure. Here is the reference architecture we deploy for high-performance pipelines:

  1. Infrastructure: CoolVDS KVM instance (4 vCPU, 8GB RAM, NVMe).
  2. OS: Debian 12 (Bookworm) or Ubuntu 24.04 LTS.
  3. Container Runtime: Docker Engine with BuildKit enabled.
  4. Caching: Local volume mounts for package managers.

1. Enabling BuildKit for Parallel Builds

If you aren't using BuildKit in 2024, you are wasting time. It allows for parallel build stages and smarter caching. Enable it globally on your runner.

export DOCKER_BUILDKIT=1

2. Optimizing the Dockerfile

Don't just copy your source code and run `npm install`. That busts the cache every time a file changes. Use bind mounts in your Dockerfile to leverage the host's cache capabilities.

# SYNTAX=docker/dockerfile:1.9
FROM node:20-alpine
WORKDIR /app

# Leverage a cache mount to /root/.npm to speed up subsequent builds
RUN --mount=type=bind,source=package.json,target=package.json \
    --mount=type=bind,source=package-lock.json,target=package-lock.json \
    --mount=type=cache,target=/root/.npm \
    npm ci --include=dev

COPY . .
RUN npm run build

This configuration alone cut one client's build time from 8 minutes to 90 seconds. The cache persists on the CoolVDS runner's NVMe disk between jobs.

Automating the Runner Deployment

Manual setup is for amateurs. We use Ansible to provision runners so we can kill them and recreate them if they get tainted. Here is a stripped-down playbook snippet we use to deploy a GitLab Runner on a fresh CoolVDS instance.

---
- hosts: build_runners
  become: true
  vars:
    gitlab_url: "https://gitlab.com/"
    registration_token: "{{ vault_runner_token }}"

  tasks:
    - name: Install Docker and dependencies
      apt:
        name: ["docker.io", "gitlab-runner", "htop"]
        state: present
        update_cache: yes

    - name: Register GitLab Runner
      command: >
        gitlab-runner register 
        --non-interactive 
        --url "{{ gitlab_url }}" 
        --registration-token "{{ registration_token }}" 
        --executor "docker" 
        --docker-image "alpine:latest" 
        --description "coolvds-nvme-runner-01" 
        --tag-list "nvme,norway,high-perf" 
        --run-untagged="true" 
        --locked="false"
      args:
        creates: /etc/gitlab-runner/config.toml

    - name: Optimize Runner Concurrency
      lineinfile:
        path: /etc/gitlab-runner/config.toml
        regexp: '^concurrent ='
        line: 'concurrent = 4'
      notify: restart_runner

  handlers:
    - name: restart_runner
      service:
        name: gitlab-runner
        state: restarted

The "Steal Time" Trap

When you use shared hosting, you are at the mercy of the hypervisor's scheduler. If another tenant spikes their usage, your build slows down. You can check this with `top` or `htop`.

top -b -n 1 | grep "Cpu(s)"

Look at the st (steal) value at the end of the line. On a quality KVM provider like CoolVDS, this should be 0.0%. If you see numbers like 2.0% or 5.0% on your current host, you are paying for CPU cycles you aren't getting. This creates "flaky builds" that pass sometimes and timeout others. Consistency is key for CI/CD reliability.

Comparison: Shared Runners vs. Dedicated CoolVDS Runner

Feature Typical Shared Runner CoolVDS Private Runner
Disk I/O Unpredictable (Network Storage) Consistent NVMe Local Storage
Caching Re-download everything (Stateless) Persistent Docker Layer Caching
Security Multi-tenant Environment Isolated KVM Kernel
Cost Control Per-minute billing (Expensive at scale) Flat monthly fee (Predictable)

Security Considerations for 2024

Since we are running Docker, we must address the security implications. Mounting the Docker socket /var/run/docker.sock can be dangerous if you run untrusted code. For high-security environments, we recommend using Kaniko or Podman for rootless builds. However, if you control the code, Docker is faster.

To keep the environment clean, set up a cron job to prune unused objects. Don't let your NVMe drive fill up with dangling layers.

0 3 * * * /usr/bin/docker system prune -af --filter "until=24h"

This command runs daily at 3 AM and wipes anything older than 24 hours. It keeps your runner lean without nuking the active cache you need for speed.

Final Verdict

Your developers cost too much to have them waiting on a slow CI pipeline. The math is simple: saving 10 minutes per build, multiplied by 20 builds a day, is over 3 hours of engineering time saved daily.

You don't need a complex Kubernetes cluster for this. You need raw, unadulterated compute power with low latency and fast disks. That is exactly what we engineered CoolVDS to provide. Spin up a dedicated instance, install a runner, and watch your pipeline go green before you can even switch context.

Ready to stop waiting? Deploy a high-performance runner on CoolVDS today and get back to shipping code.