Unblocking the Pipeline: Why High I/O Latency is Killing Your Deployment Frequency
I watched a npm install hang for four minutes yesterday. It wasn't the network. It wasn't the registry. It was the disk. If you are running your CI/CD runners on budget, over-provisioned VPS hosts, you are paying for that "savings" with hours of your developers' time every week. In the DevOps world, waiting is the enemy. And in 2023, with the complexity of microservices architectures, waiting for I/O to catch up during a build is unacceptable.
As someone who has managed infrastructure for high-traffic platforms across Oslo and Bergen, I've seen the pattern repeatedly. A team complains that their pipelines are flaky or slow. They blame Jenkins. They blame GitLab. But when we dig into the metrics, it's almost always the infrastructure underneath gasping for air. Specifically, it's I/O Wait and CPU Steal Time.
The Hidden Cost of "Shared" Resources
Most hosting providers sell you vCPUs. What they don't tell you is how many other neighbors are fighting for the cycles on that physical core. When a neighbor spins up a massive compilation job, your runner gets paused. In top, this shows up as %st (steal time).
But the real killer in CI/CD is disk I/O. Extracting node_modules, building Docker layers, and compiling Go binaries are intensely I/O heavy operations. If you are on standard SSDs (or heaven forbid, spinning rust) shared among 50 other tenants, your latency spikes.
Pro Tip: Runiostat -x 1during your build process. If your%iowaitconsistently exceeds 5-10%, your storage backend is the bottleneck, not your CPU. You need NVMe.
Diagnosis: Is Your Runner Choking?
Before we optimize, we verify. On your current CI runner, install ioping to check disk latency. We want to simulate the random read/write patterns of a build process.
apt-get update && apt-get install ioping -y
# Check latency
ioping -c 10 .
On a quality infrastructure like CoolVDS, you should see averages well under 200 microseconds. On a crowded shared host, I've seen this jump to 5 milliseconds or more. That adds up when you are writing 50,000 small files during a build.
Optimizing GitLab Runner for Performance
Let’s assume you are using GitLab CI, which is standard for many Norwegian dev teams due to its robust self-hosted options (great for data sovereignty compliance). The default configuration is rarely optimized for modern hardware.
Here is a battle-tested config.toml snippet used for high-concurrency builds. Note the concurrent limits and the usage of Docker socket binding versus Docker-in-Docker (dind). While dind offers better isolation, binding /var/run/docker.sock is significantly faster because you aren't dealing with nested filesystem overhead.
concurrent = 4
check_interval = 0
[[runners]]
name = "CoolVDS-NVMe-Runner-01"
url = "https://gitlab.example.com/"
token = "YOUR_TOKEN"
executor = "docker"
[runners.custom_build_dir]
[runners.cache]
Type = "s3"
ServerAddress = "minio.internal:9000"
AccessKey = "minio"
SecretKey = "minio123"
BucketName = "runner-cache"
Insecure = true
[runners.docker]
tls_verify = false
image = "docker:24.0.5"
privileged = true
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/var/run/docker.sock:/var/run/docker.sock", "/cache"]
shm_size = 0
pull_policy = "if-not-present"
The Database Bottleneck in Tests
Integration tests often spin up ephemeral databases. By default, MySQL/MariaDB is configured for data durability (ACID compliance). In a CI pipeline, if the container crashes, we don't care about the data. We care about speed.
You can drastically reduce test times by mounting a custom my.cnf into your service containers that disables disk syncing. This is risky for production, but perfect for CI.
[mysqld]
# DANGEROUS FOR PRODUCTION - USE ONLY FOR CI
innodb_flush_log_at_trx_commit = 2
sync_binlog = 0
innodb_buffer_pool_size = 512M
innodb_log_file_size = 128M
This tells the database: "Don't wait for the disk to confirm the write before moving on." On a standard VPS, this can cut test suite runtime by 30-40%.
Data Sovereignty and Latency in Norway
For those of us operating out of Oslo or working with clients like DNB or public sector entities, GDPR and the legacy of Schrems II are not just buzzwords. They are legal requirements. Sending your build artifacts—which contain your intellectual property and potentially sensitive test data—to a runner hosted in a US-owned cloud availability zone can trigger compliance headaches.
Keeping your CI/CD pipeline inside Norway isn't just about compliance; it's about latency. If your Git repository is hosted in Northern Europe and your runners are in Oslo, the clone operations are snappy. CoolVDS infrastructure sits directly on high-speed backbones connected to NIX (Norwegian Internet Exchange), ensuring that the handshake between your repo and your runner is near-instant.
Why KVM is Non-Negotiable for CI/CD
You might be tempted by OpenVZ or LXC containers for your runners because they are cheap. Don't do it. CI/CD workloads often require kernel-level features. Docker exploits kernel namespaces and cgroups.
Running Docker inside an OpenVZ container is a nightmare of kernel module incompatibilities. You need full hardware virtualization. KVM (Kernel-based Virtual Machine), which is the standard hypervisor for CoolVDS, provides a dedicated kernel for your instance. This allows you to load specific modules required for complex build environments (like eBPF tracing tools or specific VPN clients for deployment) without begging support to enable them on the host node.
Automated Cleanup Strategy
Fast disks fill up fast. Docker allows unused images and build caches to accumulate, eventually halting your pipeline with "No space left on device". A simple cron job on your runner is essential to keep the NVMe storage fresh for active builds.
#!/bin/bash
# /usr/local/bin/cleanup_docker.sh
# Remove unused containers
docker container prune -f
# Remove dangling images (layers not used by any tagged image)
docker image prune -f
# Remove build cache older than 48 hours
docker builder prune --filter "until=48h" -f
# Check disk usage and alert if > 80%
USAGE=$(df -h / | grep / | awk '{ print $5 }' | sed 's/%//g')
if [ "$USAGE" -gt 80 ]; then
echo "Disk usage critical: ${USAGE}%" | mail -s "CI Runner Disk Alert" ops@example.no
fi
Add this to your crontab to run daily at 4 AM local time.
The Verdict
Optimization is about removing friction. In a CI/CD pipeline, friction is mechanical latency. You cannot code your way out of slow hardware.
If your team is deploying ten times a day, saving 3 minutes per build equals 30 minutes saved per day, or roughly 10 hours a month. That is more than the cost of upgrading to a premium instance. For serious pipelines targeting the Norwegian market, you need the trifecta: KVM isolation for Docker stability, local NVMe for I/O throughput, and Norwegian data residency for compliance.
Don't let your infrastructure be the reason you miss a release window. Spin up a CoolVDS KVM instance today and see what happens when your pipeline actually breathes.