Kubernetes vs. Docker Swarm: Architecting for Stability in the High North
If I see one more production environment running off a single docker-compose up -d command inside a screen session, I might actually lose it. We are in 2020. The "works on my machine" excuse died three years ago.
But here is the reality for most DevOps teams I talk to in Oslo and Bergen: You are paralyzed by choice. Do you go with the massive complexity of Kubernetes because Google uses it? Do you stick with Docker Swarm because it's built-in? Or do you try to manage raw LXC containers because you like pain?
I have spent the last six months migrating a high-traffic e-commerce cluster from a legacy monolithic setup to a containerized architecture. Along the way, we broke networking, exhausted IOPS, and learned exactly where the breaking points are. Here is the unvarnished truth about container orchestration right now, and why your choice of infrastructure provider matters more than your choice of scheduler.
The Contenders: Swarm vs. K8s (v1.17)
Let's strip away the marketing.
Docker Swarm: The "Good Enough" Solution
Swarm is natively integrated into the Docker Engine (since 1.12). It is fast, easy to set up, and requires almost zero boilerplate. For a team of three developers managing a few microservices, Swarm is often the correct choice.
The Config: Setting up a Swarm manager is literally one command:
docker swarm init --advertise-addr 10.10.20.5
However, Swarm struggles with advanced stateful workloads. If you need complex persistent volume claims (PVCs) or granular network policies, you will hit a wall. It lacks the rich ecosystem of Helm charts that K8s offers.
Kubernetes: The Industrial Standard
With the release of version 1.17 recently, Kubernetes has stabilized significantly. But it is a beast. It is not just an orchestrator; it is an operating system for your cluster. The learning curve is a vertical wall.
Consider the complexity of a basic deployment. In Swarm, it's 5 lines of YAML. In Kubernetes, you are defining Deployments, Services, Ingress, and ConfigMaps.
apiVersion: apps/v1
kind: Deployment
metadata:
name: nordic-app
labels:
app: nordic-app
spec:
replicas: 3
selector:
matchLabels:
app: nordic-app
template:
metadata:
labels:
app: nordic-app
spec:
containers:
- name: nginx
image: nginx:1.17.8
ports:
- containerPort: 80
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
Why use K8s? Because of the ecosystem. If you need Prometheus for monitoring, Cert-Manager for Let's Encrypt, or Istio for service mesh capabilities, K8s is the only game in town.
The Hidden Killer: Etcd Latency and Storage I/O
Here is the war story. We deployed a K8s cluster for a client using a budget VPS provider hosted in Germany. Everything worked fine in testing. When we pushed live traffic, the API server started timing out. Pods were crash-looping. The logs were screaming about etcd timeouts.
The Problem? Disk Latency.
Kubernetes uses etcd as its brain. Etcd is extremely sensitive to disk write latency (fsync). If your underlying storage cannot write the state fast enough, the cluster leader fails, and chaos ensues.
Pro Tip: Never run a production K8s cluster on standard HDD or shared generic SSDs where you have noisy neighbors. You need NVMe. Period.
We ran fio benchmarks on the budget node versus a CoolVDS NVMe instance. The difference was embarrassing for the budget provider.
# The benchmark command used
fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 \
--name=test --filename=test --bs=4k --iodepth=64 --size=4G \
--readwrite=randwrite --ramp_time=4
Result: The generic VPS had 45ms fsync latency. CoolVDS NVMe had sub-1ms latency. We migrated the masters to CoolVDS, and the timeouts vanished instantly.
Network Latency: The Norway Advantage
Data sovereignty is not just a legal buzzword; it is a technical reality. With GDPR enforcement tightening and the US Cloud Act making everyone nervous, keeping data inside Norway is becoming a hard requirement for many of our clients.
But beyond the legal aspect (Datatilsynet is watching), there is physics. If your users are in Oslo and your servers are in a massive datacenter in Virginia, you are adding 90ms of latency to every handshake. For a dynamic application making multiple database calls, that delay compounds.
CoolVDS operates out of datacenters directly connected to NIX (Norwegian Internet Exchange). Pinging a CoolVDS IP from a Telenor fiber connection in Oslo takes about 2ms. That snappy response time does more for your UX than any amount of JavaScript optimization.
Configuration Tuning for 2020
Whether you choose Swarm or Kubernetes, do not stick with the defaults. Linux defaults are often set for general-purpose computing, not high-throughput container networking.
1. Optimize sysctl for Conntrack
Containers generate massive amounts of connections. You will hit the `nf_conntrack` limit faster than you think.
# /etc/sysctl.conf
net.netfilter.nf_conntrack_max = 131072
net.ipv4.tcp_keepalive_time = 600
net.ipv4.ip_forward = 1
2. Database Performance in Containers
Running MySQL or PostgreSQL in Docker is fine, if you tune the buffer pool. Don't let Docker guess.
# my.cnf optimization for a 4GB CoolVDS instance
[mysqld]
innodb_buffer_pool_size = 2G
innodb_log_file_size = 512M
innodb_flush_log_at_trx_commit = 2 # Speed over strict ACID for non-financial data
The Verdict
Choose Docker Swarm if:
- You have a team of less than 5 developers.
- You don't need complex auto-scaling logic.
- You value sleep over configuring Ingress controllers.
Choose Kubernetes if:
- You are building a cloud-native platform intended to scale.
- You need precise control over resource quotas (CPU/RAM).
- You are ready to invest time in infrastructure as code.
Regardless of your orchestrator, the hardware underneath dictates your stability. Virtualization overhead is real. We use KVM at CoolVDS because it offers true kernel isolation without the performance penalty of emulation.
Don't let slow I/O kill your SEO or your uptime. If you are building for the Nordic market, you need low latency and high IOPS.
Ready to test your cluster performance? Deploy a CoolVDS NVMe instance in 55 seconds and run your own fio tests. The results will speak for themselves.