Stop Blaming Gradle: It's Your Disk I/O
It is 3:00 AM. The staging deployment failed again. Not because of a syntax error, and not because of a failed unit test. It timed out. Again.
If you are running Jenkins or GitLab CI on a budget VPS, you are likely fighting a battle you cannot win with software optimization alone. The culprit is almost always I/O Wait. In the shared hosting world, particularly here in the Nordics where demand is skyrocketing, "noisy neighbors" are the silent killers of CI/CD pipelines. While everyone talks about CPU cores, the real bottleneck in 2018 is storage latency.
The Anatomy of a Slow Build
Let's look at a recent scenario I debugged for a client in Oslo. They were running a standard LEMP stack build inside a Docker container. The build time fluctuated wildly—sometimes 5 minutes, sometimes 45 minutes.
We ran top. CPU usage was low. RAM was fine. But the load average was through the roof. The metric that matters here is %wa (iowait) and, crucial for VPS users, %st (steal time).
Here is what we saw when we ran iostat during a heavy `npm install` sequence:
$ iostat -xz 1
avg-cpu: %user %nice %system %iowait %steal %idle
14.2 0.0 4.1 68.5 12.2 1.0
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
vda 0.00 12.00 45.00 120.00 850.00 4200.00 61.21 85.40 450.20 12.50 850.50 5.10 98.50
Look at that await column. 850ms to write to disk? That is unacceptable. That is nearly a full second for a disk operation. When you are extracting thousands of small files (like `node_modules` or Java `.class` files), this latency compounds. A 1ms delay happening 10,000 times is 10 seconds of lost life. Now multiply that by 850ms.
Furthermore, the %steal value of 12.2% indicates that the hypervisor is prioritizing other tenants over this VM. This is the hallmark of oversold hosting.
The Solution: Pipeline Optimization & Real Hardware
To fix this, we need a two-pronged approach: optimizing the Docker cache and moving to infrastructure that actually respects your I/O requirements.
1. Docker Layer Caching
If you are rebuilding your entire image on every commit, you are doing it wrong. In your `Dockerfile`, order matters. Copy your dependency definitions first, install them, and then copy your source code. This allows Docker to cache the heavy installation layer.
FROM node:9-alpine
WORKDIR /app
# COPY package.json AND package-lock.json ONLY
COPY package*.json ./
# This layer is cached unless dependencies change
RUN npm install --production
# Now copy the rest of the source
COPY . .
CMD ["npm", "start"]
In your Jenkins Pipeline (using the Declarative syntax which is finally becoming stable in version 2.x), you should explicitly manage your workspace to avoid disk thrashing:
pipeline {
agent any
stages {
stage('Build') {
steps {
sh 'docker build --cache-from my-image:latest -t my-image:${BUILD_NUMBER} .'
}
}
stage('Test') {
steps {
sh 'docker run --rm my-image:${BUILD_NUMBER} npm test'
}
}
}
post {
always {
// Clean up to save disk space, crucial on NVMe
sh 'docker system prune -f'
}
}
}
2. The Hardware: Why NVMe Matters in 2018
SATA SSDs were a revolution, but NVMe (Non-Volatile Memory Express) is the standard for high-performance workloads. NVMe connects directly to the PCIe bus, bypassing the SATA controller bottleneck.
Pro Tip: If your hosting provider doesn't specify "NVMe," it's SATA SSD or, heaven forbid, spinning rust (HDD). For database imports and CI builds, NVMe offers up to 6x the read/write speeds of standard SSDs.
At CoolVDS, we realized early on that containerization requires massive random I/O performance. That is why our Norwegian nodes are built exclusively on enterprise NVMe arrays. We don't use OpenVZ (where kernel resources are shared and easily abused); we use KVM (Kernel-based Virtual Machine) to ensure that your RAM and CPU are actually yours.
Nginx Reverse Proxy for Jenkins
Once your pipeline is running smooth, you need to secure the Jenkins dashboard. Do not expose Jenkins directly on port 8080. Use Nginx with SSL (Let's Encrypt is now widely supported via `certbot`).
Here is a battle-tested `nginx.conf` snippet for 2018-era Jenkins ensuring websockets work for the pipeline logs:
server {
listen 80;
server_name ci.yourdomain.no;
return 301 https://$host$request_uri;
}
server {
listen 443 ssl;
server_name ci.yourdomain.no;
ssl_certificate /etc/letsencrypt/live/ci.yourdomain.no/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/ci.yourdomain.no/privkey.pem;
location / {
proxy_pass http://127.0.0.1:8080;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Fixes "It appears that your reverse proxy set up is broken"
proxy_redirect http://127.0.0.1:8080 https://ci.yourdomain.no;
# Required for Jenkins Websocket Agents
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
}
The Norwegian Context: GDPR is Coming
We are only months away from May 25, 2018. The General Data Protection Regulation (GDPR) is going to change how we handle data. If your CI/CD pipeline processes production dumps for testing (sanitized or not), data sovereignty becomes a legal issue.
Hosting outside the EU/EEA is becoming a compliance nightmare. By keeping your build servers and staging environments on CoolVDS infrastructure located in Oslo, you benefit from:
- Data Sovereignty: Your data stays under Norwegian jurisdiction and Datatilsynet oversight.
- Low Latency: Direct peering at NIX (Norwegian Internet Exchange) means latency to local ISPs (Telenor, Telia) is often sub-5ms.
- Stability: Norwegian hydropower grids provide some of the most stable uptime in Europe.
Conclusion
You can optimize your `Jenkinsfile` until it is a work of art, but you cannot code your way out of bad physics. If your `iowait` is high and your `steal time` is creeping up, it is time to move.
Don't let your infrastructure be the bottleneck that delays your release. Test your pipeline on a KVM-based, NVMe-powered instance today. Spinning up a CoolVDS server takes less than 55 seconds—likely faster than your current `npm install`.