Stop Over-Engineering Your Cluster: A Pragmatic Look at Orchestration in 2021
Let’s be honest. If I hear one more startup with three developers and a monolithic PHP app tell me they need a multi-region Kubernetes federation, I might just `rm -rf /` my own workstation. The hype cycle in 2021 has reached a fever pitch. Everyone wants to be Google, but nobody wants to manage the operational debt that comes with it.
I’ve spent the last decade fixing broken infrastructure across Europe. I’ve seen Docker Swarm clusters running critical banking services seamlessly, and I’ve seen Kubernetes clusters implode during Black Friday because the `etcd` latency spiked by 10ms. In Norway, where we have specific legal constraints (thanks, Schrems II) and high expectations for uptime, picking the right tool is not about fashion. It’s about survival.
Today, we aren't just looking at features. We are looking at the Total Cost of Operation (TCO), hardware dependencies, and why running these orchestrators on the wrong VPS is a death sentence for your performance.
The Contenders
1. Kubernetes (K8s): The 800lb Gorilla
Kubernetes is the standard. Version 1.22 dropped recently, and the removal of Dockershim is looming, causing panic in teams that haven't migrated to containerd yet. K8s is powerful, but it is heavy. A standard control plane requires significant CPU cycles just to exist.
When to use it: You have a team of at least 3 dedicated DevOps engineers, or you are running microservices with complex auto-scaling requirements. If you need a service mesh like Istio, you need K8s.
Here is a standard, no-nonsense deployment manifest. Note the resource limits. If you don't define these, your neighbors on a shared host will hate you (and your OOMKiller will strike).
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-production
labels:
app: web-gateway
spec:
replicas: 3
selector:
matchLabels:
app: web-gateway
template:
metadata:
labels:
app: web-gateway
spec:
containers:
- name: nginx
image: nginx:1.21-alpine
ports:
- containerPort: 80
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
readinessProbe:
httpGet:
path: /health
port: 80
initialDelaySeconds: 5
periodSeconds: 102. Docker Swarm: The "Dead" Tech That Won't Die
People keep saying Swarm is dead. Yet, I deploy it weekly. Why? Because `docker stack deploy` just works. It is built into the Docker engine. There is no `etcd` to manage separately (it's integrated). The Raft consensus traffic is lighter. For a team needing to host a few services with high availability, Swarm is incredibly efficient.
The Reality Check: Swarm lacks the rich ecosystem of K8s (no Helm charts, limited CRDs). But for 90% of use cases in Norway's SME sector, it is sufficient.
version: "3.8"
services:
web:
image: nginx:alpine
deploy:
replicas: 5
update_config:
parallelism: 2
delay: 10s
restart_policy:
condition: on-failure
ports:
- "80:80"
networks:
- webnet
networks:
webnet:3. Nomad: The Unix Philosophy
HashiCorp's Nomad is a single binary. It schedules containers, Java jars, or just raw binaries. It is simpler than K8s but more flexible than Swarm. It integrates perfectly with Consul and Vault.
The Hidden Bottleneck: Storage I/O
Here is the part most "cloud architects" ignore. They obsess over the scheduler but deploy it on cheap storage. Orchestrators are essentially distributed databases. Kubernetes relies on `etcd`. Swarm relies on its internal Raft store.
If your disk write latency is high, the consensus algorithm fails. Nodes flap. The cluster thinks a healthy node is dead. Chaos ensues.
Pro Tip: Never run a production cluster on standard HDD or shared SATA SSDs where