Console Login

Stop Watching Progress Bars: Optimizing CI/CD IOPS and Latency in 2018

Stop Watching Progress Bars: Optimizing CI/CD IOPS and Latency

There is nothing more demoralizing for a development team than a 25-minute build time for a three-line CSS change. I have watched talented engineers lose their focus, tab out to Reddit, and lose the flow state entirely because their CI pipeline was choking on disk I/O.

It is now December 2018. If you are still running your Jenkins or GitLab runners on standard HDD storage or over-sold VPS instances with noisy neighbors, you are actively burning money. The bottleneck in modern Continuous Integration isn't usually CPU; it is almost always I/O wait times and network latency during dependency resolution.

We are going to fix that. We will look at optimizing the Docker storage driver, implementing aggressive caching, and why physical location relative to NIX (Norwegian Internet Exchange) matters more than you think.

The Hidden Killer: Disk I/O and the Overlay2 Driver

When a CI job starts, it pulls an image, extracts layers, and installs dependencies. npm install, pip install, and mvn install are all essentially torture tests for random write speeds. If your hosting provider throttles IOPS, your build hangs.

First, ensure your Docker daemon is using the overlay2 storage driver. In earlier versions of Docker, devicemapper was common but performed poorly. By Docker CE 18.06 (and the current 18.09), overlay2 is the preferred choice for Linux kernel 4.0+, which you should be running if you are on Ubuntu 18.04 LTS.

Check your current driver:

docker info | grep Storage

If it returns aufs or devicemapper, you need to upgrade. Here is how to force it in your /etc/docker/daemon.json:

{
  "storage-driver": "overlay2",
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  }
}
Pro Tip: Never let Docker logs grow infinitely on a CI server. The `log-opts` above rotate logs automatically. I've seen entire production pipelines halt because a runaway container filled the disk with stdout logs.

Caching Strategies: Stop Downloading the Internet

Downloading the same libraries on every commit is inefficient. With GDPR now fully enforceable as of May this year, relying on US-based mirrors for every artifact fetch is also a compliance risk if you aren't careful about what data is traversing boundaries. Keep it local.

1. Docker Layer Caching

In your Dockerfile, order matters. Copy your dependency definitions (package.json, pom.xml) before your source code. This allows Docker to cache the installation layer unless dependencies change.

# Bad Practice
COPY . .
RUN npm install

# Good Practice (2018 Standard)
FROM node:10-alpine
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm install
COPY . .
CMD ["npm", "start"]

2. GitLab CI Cache Configuration

If you are using GitLab CI (very popular in the Nordic dev scene right now), you must define the cache correctly in .gitlab-ci.yml. Do not cache `node_modules` globally if you have multiple branches with different dependencies; use a cache key based on the lock file.

cache:
  key: ${CI_COMMIT_REF_SLUG}
  paths:
    - node_modules/

stages:
  - build

build_job:
  stage: build
  script:
    - npm install
    - npm run build
  tags:
    - docker
    - nvme

The Infrastructure Reality: Why NVMe Matters

Software optimization can only go so far. Eventually, you hit the physics of the hardware. A standard SATA SSD pushes about 600 MB/s. An NVMe drive, utilizing the PCIe bus, pushes 3,500 MB/s. In a CI environment where you are creating and destroying thousands of tiny files per minute, this difference is not incremental; it is exponential.

At CoolVDS, we switched our standard offering to NVMe because we saw high-churn workloads (like CI/CD and databases) suffering on standard SSDs due to IOPS throttling. When a neighbor on a shared host starts a backup, your build time shouldn't double.

Comparison: Build Time for a React App (Standard Template)

Storage Type Clone & Install Time Build Time Total
HDD (Legacy VPS) 4m 12s 3m 45s 7m 57s
SATA SSD 1m 30s 1m 15s 2m 45s
CoolVDS NVMe 0m 45s 0m 35s 1m 20s

Data Sovereignty and Latency

For Norwegian teams, latency to the repo and the artifact registry is often overlooked. If your servers are in Frankfurt but your office is in Oslo, you are adding 20-30ms round trip time to every single handshake. In a build process involving thousands of requests, this accumulates.

Furthermore, with the Data Inspectorate (Datatilsynet) becoming stricter post-GDPR implementation, knowing exactly where your build artifacts and test databases reside is critical. Using a VPS provider with data centers physically located in Norway eliminates ambiguity regarding data residency. It ensures your intellectual property never leaves the jurisdiction.

Setting Up a Local Registry Mirror

To reduce latency further, run a local Nexus or simple Docker registry mirror on your VPS. This allows you to pull images through the internal network rather than the public internet.

docker run -d -p 5000:5000 --restart=always --name registry \
  -v /mnt/nvme/registry:/var/lib/registry \
  registry:2

Configure your CI runners to use this insecure-registry (if strictly internal) or secure it with Let's Encrypt using certbot. This keeps traffic inside your private network or on the CoolVDS high-speed internal backbone.

Conclusion

You don't need to rewrite your entire codebase to get faster builds in 2018. You need to upgrade your underlying assumptions about storage and caching.

  1. Force overlay2 drivers.
  2. Cache dependencies aggressively based on lockfiles.
  3. Move workloads to NVMe storage to eliminate I/O wait.
  4. Keep data in Norway to satisfy Datatilsynet and reduce network hops.

If you are tired of fighting with sluggish, over-sold instances, it is time for a change. Don't let slow I/O kill your SEO or your developers' morale.

Deploy a high-performance CI runner on CoolVDS today. Our KVM-based NVMe instances are provisioned in under 55 seconds.