Console Login

Taming the Beast: A Deep Dive into Kubernetes 1.1 Networking & Performance on Bare Metal

Taming the Beast: A Deep Dive into Kubernetes 1.1 Networking & Performance on Bare Metal

Let’s be honest: Kubernetes 1.0 was a proof of concept. But with the release of Kubernetes 1.1 last month, we are finally looking at a system that might actually survive a production load. However, if you have tried spinning up a cluster across multiple nodes manually—without the hand-holding of Google Container Engine (GKE)—you have likely hit the wall. That wall is networking.

I’ve spent the last three weeks debugging a distributed ReplicationController setup for a client in Oslo. The pods were running, but cross-node communication was dropping packets like a cheap switch. Everyone talks about the elegance of the "Pod-per-IP" model, but few explain how to actually implement it without wrecking your latency.

In this post, we are going to rip open the Kubernetes networking stack, look at why userspace proxying is killing your throughput, and why the recent Schrems I / Safe Harbor invalidation means you should probably be building this on Norwegian infrastructure anyway.

The "Flat Network" Lie

Kubernetes assumes a flat network address space. Every pod gets an IP. Every pod can talk to every other pod without NAT. It sounds beautiful until you have to implement it on Layer 3.

Unless you control the physical switches (which you don't on a VPS), you are forced to use an Overlay Network. In 2015, your main contenders are Flannel (CoreOS) and Weave.

The Flannel UDP vs. VXLAN Debate

We initially deployed Flannel using the UDP backend because it's the default. Big mistake. The context switching overhead of encapsulating packets in userspace caused serious I/O wait times on our database nodes.

If your kernel supports it (kernel 3.12+), you must use the VXLAN backend. It encapsulates packets in the kernel, avoiding the userspace round-trip.

Here is the etcd config you need to push to force VXLAN. Don't leave this to the defaults:

# Populate etcd with the Flannel config
curl -L http://127.0.0.1:2379/v2/keys/coreos.com/network/config -XPUT -d value='{
  "Network": "10.100.0.0/16",
  "SubnetLen": 24,
  "Backend": {
    "Type": "vxlan"
  }
}'

Once we switched to VXLAN on our CoolVDS KVM instances, iperf tests between pods on different nodes jumped from ~400Mbps to near line-speed. The CPU stealing dropped to almost zero.

The Performance Killer: Kube-Proxy Userspace Mode

By default, the kube-proxy component (which handles Service VIPs) runs in userspace mode. It manipulates iptables to capture traffic to a Service IP, sends it to the kube-proxy process, which then proxies it to the backend Pod.

This is stable, but slow. It adds an extra hop for every single packet.

Kubernetes 1.1 introduces a new, albeit experimental, iptables mode. This moves the load balancing logic entirely into kernel-space using netfilter. If you care about request latency—and if you are running a Magento store or a high-traffic API, you definitely do—you need to enable this.

Here is how we configured the kube-proxy daemon on our worker nodes (running Ubuntu 14.04 LTS):

# /etc/default/kube-proxy

KUBE_PROXY_ARGS="--master=http://10.0.0.1:8080 \
--proxy-mode=iptables \
--kubeconfig=/var/lib/kube-proxy/kubeconfig \
--proxy-port-range=0-0"
Pro Tip: When using iptables mode, if the target pod usually fails, there is no automatic retry at the proxy level (unlike userspace). You need robust readiness probes defined in your pod specs to ensure traffic isn't sent to a dead container.

The Legal Storm: Safe Harbor is Dead

Technicals aside, we need to talk about the elephant in the server room. In October, the European Court of Justice invalidated the Safe Harbor agreement (Schrems I). If you are a Norwegian business storing customer data, relying blindly on US-controlled clouds (AWS, Google, Azure) has just become a massive compliance risk.

The Norwegian Data Protection Authority (Datatilsynet) is already signaling stricter enforcement. This is where infrastructure ownership matters.

Running Kubernetes on CoolVDS isn't just a performance play; it's a sovereignty play. Our datacenters are in Oslo. Your data stays in Norway, governed by Norwegian law, not subject to the PATRIOT Act. For our enterprise clients, this legal stability is worth more than any raw compute metric.

Verifying the Stack

After setting up the overlay and the proxy, you need to verify the routing table. A healthy node running Flannel and Docker 1.9 should look something like this:

$ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.1.1     0.0.0.0         UG    0      0        0 eth0
10.100.0.0      0.0.0.0         255.255.0.0     U     0      0        0 flannel.1
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0

If you see a docker0 IP range overlapping with your physical network, you are going to have a bad time. Always ensure your --bip setting in the Docker daemon config does not conflict with the node IP.

Why Hardware Matters for K8s

Kubernetes is resource-hungry. The etcd cluster needs fast disk I/O to maintain consensus (Raft protocol). If your disk latency spikes, the leader election fails, and your cluster falls apart. We have seen this happen repeatedly on shared hosting providers that oversell their storage.

This is why we equip CoolVDS instances with enterprise NVMe storage. The IOPS capability of NVMe is essential for keeping etcd stable under load. When you combine that with our direct peering to NIX (Norwegian Internet Exchange), you get low latency access to your users in Oslo and Bergen that international providers simply cannot match physically.

Final Thoughts

Kubernetes 1.1 is powerful, but it requires a solid understanding of Linux networking primitives. Don't settle for default configurations. Switch Flannel to VXLAN, enable iptables proxy mode, and ensure your data sovereignty is respected by hosting on local soil.

Ready to build a cluster that doesn't falter under load? Deploy a high-performance NVMe VPS on CoolVDS today and get root access in under 55 seconds.