Kubernetes vs. Swarm in 2021: Stop Over-Engineering Your Infrastructure
Let’s be honest. Half of you reading this are running a Kubernetes cluster that burns more CPU cycles managing itself than serving your actual application. It’s August 2021, and the industry obsession with "Google-scale" infrastructure has led to a plague of over-engineering.
I recently audited a setup for a logistics firm in Bergen. They were burning $4,000 a month on a managed Kubernetes solution to host... a monolithic PHP application and a Redis cache. The latency was terrible. Not because of the code, but because the overlay network overhead was choking the packet flow. We migrated them to a simple Docker Swarm setup on bare-metal adjacent KVM instances. Costs dropped by 60%. Latency to NIX (Norwegian Internet Exchange) dropped to 2ms.
If you are a CTO or Lead DevOps operating in Europe today, you have two battles: Technical complexity and the legal minefield of Schrems II. Here is the pragmatic breakdown of where to run your containers this year.
The Contenders: K8s vs. The "Dead" Swarm
Docker Swarm is not dead. Mirantis acquired the enterprise arm, but the community edition remains built into the Docker engine. It is robust, boring, and fast.
Scenario A: The Case for Docker Swarm
If you have a team of three developers and you need to deploy a microservices stack without hiring a full-time Site Reliability Engineer (SRE), use Swarm. The learning curve is essentially zero if you know Docker Compose.
Here is a production-ready stack definition. Notice the simplicity:
version: '3.8'
services:
web:
image: nginx:1.21-alpine
deploy:
replicas: 4
update_config:
parallelism: 2
delay: 10s
restart_policy:
condition: on-failure
ports:
- "80:80"
networks:
- webnet
api:
image: my-registry.com/backend:v4.2
deploy:
placement:
constraints:
- node.role == worker
environment:
- DB_HOST=10.0.0.5
networks:
- webnet
networks:
webnet:
driver: overlay
driver_opts:
encrypted: "true"
That encrypted: "true" flag in the overlay network is critical. It uses IPsec ESP for data plane traffic. On budget VPS providers with old CPUs, this kills throughput. On CoolVDS instances, where we pass through AES-NI instructions from the host, the overhead is negligible.
Scenario B: The Kubernetes Imperative
You need Kubernetes (K8s) if:
- You need Custom Resource Definitions (CRDs) for operators (e.g., Prometheus Operator).
- You require granular Role-Based Access Control (RBAC) for a large team.
- You are running a hybrid cloud setup.
But K8s is fragile. The Control Plane relies heavily on etcd, a distributed key-value store. Etcd is incredibly sensitive to disk write latency (WAL fsync). If your underlying storage cannot guarantee low latency, the cluster leader election fails. The cluster splits brain. You go offline.
The Storage Bottleneck
Most hosting providers oversell IOPS. They put you on a shared SATA SSD array with 50 other noisy neighbors. When a neighbor runs a backup, your K8s API server times out.
At CoolVDS, we use local NVMe storage. We don't use network-attached block storage for root volumes because the network latency adds up. Here is how you verify your disk latency for etcd compliance:
# Run this inside your node
fio --rw=write --ioengine=sync --fdatasync=1 \
--directory=test-data --size=22m --bs=2300 \
--name=mytest
If the 99th percentile fdatasync duration is > 10ms, your Kubernetes cluster will be unstable. On our standard NVMe instances, we consistently see under 2ms.
The "Schrems II" Reality Check
Since the CJEU ruling last year (July 2020), transferring personal data to US-owned cloud providers is legally risky for Norwegian businesses. The Datatilsynet (Norwegian Data Protection Authority) is watching. Relying on Standard Contractual Clauses (SCCs) is no longer a