Console Login

Orchestration Wars 2019: Kubernetes vs. Docker Swarm vs. Nomad for Nordic Infrastructure

Orchestration Wars 2019: Kubernetes vs. Docker Swarm vs. Nomad

Let’s get one thing straight: most of you do not need Google-scale infrastructure. I’ve sat in too many meetings in Oslo business parks watching CTOs over-engineer their stack because they read a Medium article about microservices. They deploy a 12-node Kubernetes cluster to host a WordPress site and a Redis cache, then wonder why their cloud bill rivals a small mortgage and their latency to NIX (Norwegian Internet Exchange) is erratic.

I’ve spent the last six months migrating a fintech client off a messy OpenStack setup. The lesson? Complexity is technical debt. If you cannot debug it at 3:00 AM after a double espresso, don't deploy it.

Today, strictly looking at the landscape as it stands in late 2019, we are comparing the three main contenders: Kubernetes (the heavyweight), Docker Swarm (the native choice), and HashiCorp Nomad (the pragmatist's choice). We will also discuss why your choice of underlying VPS—specifically regarding NVMe storage and KVM isolation—matters more than which YAML dialect you speak.

The Latency Trap: Why Hardware Still Wins

Before we touch the software, we need to address the physics. Orchestration relies heavily on a distributed state. In Kubernetes, this is etcd. If your underlying storage cannot handle high IOPS (Input/Output Operations Per Second), etcd starts timing out heartbeats. Your nodes go NotReady. Your pods get rescheduled. The cluster eats itself.

Pro Tip: Never run a production orchestration cluster on standard HDD or shared-kernel containers (like OpenVZ). You need dedicated KVM resources. The noisy neighbor effect on storage I/O will kill your consensus algorithm.

Here is how I benchmark storage before even installing Docker. If I don't see write latencies under 5ms, I kill the server. On CoolVDS NVMe instances, we consistently see fsync times that make etcd happy.

# The only benchmark that matters for etcd stability
fio --rw=write --ioengine=sync --fdatasync=1 \
    --directory=test-data --size=22m --bs=2300 \
    --name=mytest

If the 99th percentile fdatasync latency is above 10ms, your Kubernetes cluster will eventually fail under load. Period.

1. Kubernetes (K8s): The De Facto Standard

With the release of version 1.16 just last month, Kubernetes has deprecated a lot of old APIs (`extensions/v1beta1`). It’s maturing, but it is heavy. It requires a dedicated team to manage effectively.

The Use Case

Use K8s if you have a team of at least 5 DevOps engineers, strict regulatory requirements (like specific GDPR data isolation rules enforced by NetworkPolicies), or a truly heterogeneous workload.

The Configuration Reality

A simple deployment is never simple. You need to manage Ingress controllers, CNI plugins (Calico, Flannel), and persistent storage. Here is a standard snippet for a high-availability deployment we use for a client in Trondheim:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-gateway
  labels:
    app: gateway
spec:
  replicas: 3
  selector:
    matchLabels:
      app: gateway
  template:
    metadata:
      labels:
        app: gateway
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - gateway
            topologyKey: "kubernetes.io/hostname"
      containers:
      - name: nginx
        image: nginx:1.17.4-alpine
        ports:
        - containerPort: 80
        resources:
          requests:
            memory: "64Mi"
            cpu: "250m"
          limits:
            memory: "128Mi"
            cpu: "500m"

Notice the podAntiAffinity. We use this to ensure that even if a hypervisor fails, our pods are spread across different nodes. This only works effectively if your provider offers true anti-affinity at the VM level—something CoolVDS architecture supports implicitly by balancing load.

2. Docker Swarm: The "Good Enough" Solution

Docker Swarm is currently effectively dead in the hype cycle, but alive and well in production. It is built into the Docker engine. There is no extra binary to install. It uses the same docker-compose.yml syntax you use for development.

The Use Case

Small to medium teams (1-50 developers) who want to move from a single server to a cluster without hiring a K8s administrator. It handles networking overlay and secrets management out of the box.

version: '3.7'
services:
  web:
    image: nginx:alpine
    deploy:
      replicas: 2
      update_config:
        parallelism: 2
        delay: 10s
      restart_policy:
        condition: on-failure
    ports:
      - "80:80"
    networks:
      - webnet

networks:
  webnet:

The command to deploy? docker stack deploy -c docker-compose.yml mystack. That's it. For 80% of the shops in Norway running straightforward CRUD apps, this is sufficient.

3. HashiCorp Nomad: The Unix Philosophy

Nomad is a scheduler, not a complete platform. It doesn't care if you are running Docker containers, Java JARs, or raw binaries. It is a single binary that is incredibly lightweight.

The Use Case

You have legacy applications that cannot be containerized yet, or you want extreme simplicity. I’ve seen Nomad clusters scale to 10,000 nodes without the etcd performance issues that plague K8s.

The Network & Data Privacy Angle (Norway Context)

In 2019, data residency is the hot topic. Datatilsynet is watching. If you are handling Norwegian citizen data, you need to know exactly where those bits live. Using a US-based hyperscaler often involves legal gray areas regarding the CLOUD Act.

Running your own orchestration layer on local VPS Norway providers like CoolVDS ensures you have total control. No hidden replication to a data center in Frankfurt or Iowa. Your data stays in the jurisdiction you chose.

Optimizing for the Norwegian Grid

Latency from Oslo to internal Norwegian endpoints is typically 2-4ms. Latency to Central Europe is 25-35ms. For database replication (like MySQL Group Replication or Galera), this difference is massive.

Here is a generic my.cnf tweak for Galera clusters running on our NVMe instances to handle the latency while maintaining ACID compliance:

[mysqld]
# NVMe Optimization
innodb_io_capacity = 2000
innodb_io_capacity_max = 4000
innodb_flush_method = O_DIRECT

# Galera / Clustering
binlog_format = ROW
default_storage_engine = InnoDB
innodb_autoinc_lock_mode = 2
wsrep_provider = /usr/lib64/galera-3/libgalera_smm.so
wsrep_cluster_address = gcomm://10.0.0.1,10.0.0.2,10.0.0.3

Conclusion: Don't Follow the Herd

If you are building a bank, use Kubernetes. If you are building a startup, use Swarm or Nomad.

But regardless of the scheduler, remember that software cannot fix slow hardware. A container orchestrator is only as stable as the virtual machine it runs on. We designed CoolVDS specifically for these high-I/O workloads—offering pure KVM virtualization and NVMe storage that doesn't choke when your cluster rebalances.

Don't let IO wait kill your uptime. Spin up a CoolVDS instance today and test your disk latency yourself. If it’s not instant, it’s broken.