Console Login

Kubernetes vs. Docker Swarm: The 2021 Infrastructure Reality Check

Kubernetes vs. Docker Swarm: Picking Your Poison for 2021

It is December 2020, and the DevOps community is in a mild state of panic. Kubernetes 1.20 just dropped, and with it came the deprecation notice for dockershim. Twitter is on fire. CTOs are asking if their pipelines are about to break. If you have been in the trenches as long as I have, you know this is just another Tuesday.

I have spent the last week migrating a high-traffic e-commerce cluster from a legacy bare-metal setup to a virtualized environment. The decision paralysis is real. Do you need the infinite configurability of Kubernetes, or do you just need to ship containers without needing a PhD in YAML?

Let’s cut through the noise. We are going to look at this from a pure infrastructure perspective, focusing on what keeps your pagers silent at 3 AM: latency, I/O throughput, and compliance.

The Kubernetes "Tax": When Overhead Eats Your Margins

Kubernetes (K8s) is the de facto standard. I get it. But running a control plane is not free. In a recent deployment for a client in Bergen, we analyzed the resource overhead of a vanilla K8s cluster versus Docker Swarm on identical hardware.

The results were sobering. On smaller nodes (2 vCPU, 4GB RAM), the K8s control plane components (API server, scheduler, controller-manager, and especially etcd) consumed nearly 25% of the available resources before a single application pod was deployed.

If you are managing a massive microservices architecture, this tax is worth paying. But if you are running five services and a database, you are burning money.

The etcd Storage Bottleneck

Here is the war story. We had a K8s cluster flapping constantly. Nodes going NotReady randomly. We blamed the network. We blamed the CNI plugin.

It wasn't the network. It was the disk.

Kubernetes relies on etcd for state. etcd is incredibly sensitive to disk write latency. If fsync takes too long, the cluster destabilizes. Most budget VPS providers oversell their storage I/O, leading to "noisy neighbor" problems that kill K8s clusters.

You need to verify your storage. Here is the fio command I use to benchmark disk latency before I even think about installing kubeadm:

fio --rw=write --ioengine=sync --fdatasync=1 --directory=test-data --size=22m --bs=2300 --name=mytest

If the 99th percentile fdatasync latency is over 10ms, do not run Kubernetes on that node. It will fail.

Pro Tip: This is why we standardized on NVMe storage for all CoolVDS instances. When we run this benchmark on our Oslo nodes, we consistently see latencies under 2ms. Fast storage covers a multitude of configuration sins.

Docker Swarm: The "Good Enough" Hero

While K8s 1.20 is making headlines, Docker Swarm is quietly running half the internet's small-to-medium workloads. It is built into the Docker engine. There is no separate binary to install. No certificates to manually rotate (Swarm handles this automatically).

For a project last month, we needed to deploy a GDPR-compliant logging stack quickly. The deadline was tight. Setting up a highly available K8s cluster would have taken two days of hardening. Swarm took 10 minutes.

# On the manager node
docker swarm init --advertise-addr 10.10.10.5

# On the worker node
docker swarm join --token SWMTKN-1-49nj1cmql0jl3e... 10.10.10.5:2377

That is it. You have a cluster. However, Swarm has limits. It struggles with advanced autoscaling logic and complex stateful sets compared to K8s Operators. But if your goal is low TCO (Total Cost of Ownership), Swarm on a high-performance VPS is a lethal combination.

The Compliance Elephant: Schrems II and Hosting in Norway

We cannot talk about infrastructure in late 2020 without mentioning the legal landscape. The CJEU's Schrems II ruling in July invalidated the Privacy Shield. If you are a Norwegian company dumping customer data into US-owned clouds (AWS, GCP, Azure), you are now operating in a legal minefield regarding data transfers.

This is where local infrastructure becomes a technical requirement, not just a patriotic one. Hosting on CoolVDS servers located physically in Oslo/Europe simplifies your GDPR compliance posture significantly. You aren't just buying low latency to the NIX (Norwegian Internet Exchange); you are buying legal peace of mind.

Optimizing the Node: Kernel Tuning

Whether you choose K8s or Swarm, the default Linux kernel settings are rarely optimized for container workloads. High container density creates a lot of network namespace churn.

Add this to your /etc/sysctl.conf to prevent connection tracking table exhaustion—a common issue when you have hundreds of containers talking to each other:

# Increase connection tracking limits
net.netfilter.nf_conntrack_max = 131072
net.ipv4.netfilter.ip_conntrack_generic_timeout = 120

# Allow more pending connections
net.core.somaxconn = 4096

# Enable forwarding for overlay networks
net.ipv4.ip_forward = 1

Apply it with sysctl -p. Do not skip this. I have seen load balancers drop packets purely because the kernel's default connection limit was stuck in 1999.

Conclusion: Metal Matters More Than YAML

The Kubernetes vs. Swarm debate is often a distraction. The tool doesn't matter if the foundation is crumbling. If your storage I/O is slow, K8s will crash. If your network latency to your users in Oslo is high, your app feels sluggish regardless of your orchestration efficiency.

For 2021, my advice is pragmatic: If you have a team of five DevOps engineers, go Kubernetes. If you are a lean team focused on product, stick with Swarm.

But whatever you choose, run it on hardware that keeps up. Container orchestration is I/O heavy. Don't let slow disks kill your uptime. Deploy a CoolVDS NVMe instance today and see what sub-millisecond latency does for your cluster stability.