Console Login

Kubernetes vs. Docker Swarm in 2018: A Real-World Performance Analysis for Nordic Infrastructure

Stop Over-Engineering Your Cluster: K8s vs. Swarm in the GDPR Era

It is March 2018. Every conference talk is about Kubernetes. Every CTO wants "Google-scale" infrastructure. But let’s be honest: you are not Google. You are running a Magento backend or a microservices stack for a Norwegian fintech startup, and you are terrified of May 25th.

I have spent the last six months migrating legacy monoliths to containers. I have seen clusters implode not because of bad code, but because of bad infrastructure choices. The debate isn't just "Kubernetes vs. Docker Swarm." It is about operational overhead vs. delivery speed.

If your etcd latency spikes, your cluster dies. It doesn't matter how fancy your YAML files are. Let's look at the hard truth of orchestration on the metal available today.

The Contenders: Complexity vs. Velocity

1. Kubernetes (The Heavyweight)

Kubernetes (v1.10 just dropped) is the undisputed king of features. Pods, Sidecars, RBAC—it has it all. But it requires a dedicated team to manage.

The Pain Point: The control plane. Specifically, etcd. Kubernetes stores the state of the entire cluster in etcd. If you run this on standard SATA SSDs or, God forbid, spinning rust, you will hit election timeouts. The cluster will partition. Your pager will go off at 3 AM.

Here is a snippet from a systemd unit file for an optimized etcd cluster we deployed last week. Note the aggressive tuning to handle network jitter across availability zones:

# /etc/systemd/system/etcd.service
[Service]
ExecStart=/usr/local/bin/etcd \
  --name infra0 \
  --initial-advertise-peer-urls http://10.0.1.10:2380 \
  --listen-peer-urls http://10.0.1.10:2380 \
  --listen-client-urls http://10.0.1.10:2379,http://127.0.0.1:2379 \
  --advertise-client-urls http://10.0.1.10:2379 \
  --initial-cluster-token etcd-cluster-1 \
  --initial-cluster infra0=http://10.0.1.10:2380,infra1=http://10.0.1.11:2380,infra2=http://10.0.1.12:2380 \
  --initial-cluster-state new \
  --heartbeat-interval 1000 \
  --election-timeout 5000 \
  --data-dir=/var/lib/etcd

We had to bump election-timeout to 5000ms because the previous hosting provider had "noisy neighbors" stealing CPU cycles, causing etcd to miss heartbeats.

2. Docker Swarm (The Sprinter)

Docker Swarm is built into the engine (Swarm mode). It is simple. It works. For 80% of use cases in 2018, it is enough.

The Reality: You can spin up a Swarm in seconds. No external database to manage. It uses the Raft consensus algorithm internally, but it is far less chatty than K8s.

# Initialize the manager
docker swarm init --advertise-addr 192.168.10.5

# Deploy a full stack with one command
docker stack deploy -c docker-compose.yml production_stack

The Hardware Bottleneck: Why Your VPS Matters

This is where most DevOps engineers fail. They treat the VPS as a commodity. They assume 4 vCPUs at Provider A is the same as 4 vCPUs at Provider B.

Wrong.

Container orchestration relies heavily on I/O. When you update a deployment in Kubernetes, it hammers the disk with state changes. If your underlying storage is slow, the API server lags. Pod creation stalls.

Pro Tip: Always check the iowait metric. If you see it creeping above 5% during a deployment rollout, your storage is the bottleneck, not your CPU. Standard SSDs often cap out at 500 MB/s. NVMe drives, like the ones standard on CoolVDS, push 3000+ MB/s. That difference is the line between a successful rolling update and a downtime incident.

Benchmarking Disk Latency: Fsync is King

We ran a test comparing a generic "Cloud VPS" against a CoolVDS KVM instance using fio to simulate etcd's write pattern (sequential writes, frequent fsync).

# Simulating etcd write load
fio --rw=write --ioengine=sync --fdatasync=1 --directory=/var/lib/etcd-test --size=100m --bs=2300 --name=etcd_bench

Results:

Metric Generic Cloud VPS CoolVDS (NVMe)
IOPS (Write) 450 12,500+
Latency (99th percentile) 15ms 0.3ms
fsync duration Variable (jitter) Stable

That 0.3ms latency is why we use CoolVDS for our control planes. When etcd writes to the WAL (Write Ahead Log), it waits for the disk to confirm. High latency here stops the entire cluster from accepting new commands.

The Norwegian Context: GDPR & Latency

We are two months away from GDPR enforcement. Datatilsynet (The Norwegian Data Protection Authority) is not going to be lenient. If you are processing personal data of Norwegian citizens, data residency is your safety net.

Hosting your cluster in Frankfurt or London adds latency. A round trip from Oslo to Frankfurt is ~25-30ms. From Oslo to a local specialized datacenter? <2ms.

If you are building a microservices architecture where Service A calls Service B, which calls Service C, that network latency compounds. 30ms becomes 90ms. Your user feels it.

A Sample Swarm Stack for a GDPR-Compliant App

Here is a docker-compose.yml (v3.3) tailored for a local Norwegian deployment, ensuring logs are rotated and data is persistent on high-speed volumes.

version: "3.3"
services:
  app:
    image: registry.coolvds.com/fintech-app:v2.1
    deploy:
      replicas: 5
      update_config:
        parallelism: 2
        delay: 10s
      restart_policy:
        condition: on-failure
    logging:
      driver: "json-file"
      options:
        max-size: "200k"
        max-file: "10"
    environment:
      - DB_HOST=db
      - TZ=Europe/Oslo

  db:
    image: mariadb:10.2
    volumes:
      - db_data:/var/lib/mysql
    deploy:
      placement:
        constraints: [node.role == manager]
    environment:
      - MYSQL_ROOT_PASSWORD_FILE=/run/secrets/db_root_password
    secrets:
      - db_root_password

volumes:
  db_data:
    driver: local

secrets:
  db_root_password:
    external: true

Conclusion: Choose Stability

If you have a team of 10 DevOps engineers, go with Kubernetes. It is the future. Just make sure you run it on bare metal or high-performance KVM instances like CoolVDS to avoid the "noisy neighbor" effect crashing your etcd cluster.

If you are a smaller team, stick to Docker Swarm. It is robust, easy to debug, and in 2018, it is more than capable of handling production loads.

Whatever you choose, do not let your hardware be the weakest link. Latency kills conversions. Slow disks kill clusters.

Ready to benchmark your orchestration? Spin up a CoolVDS NVMe instance in Oslo. Experience the difference of raw, unthrottled I/O.