Console Login

Kubernetes Networking Deep Dive: CNI, Iptables, and Performance in v1.3

Surviving the Kubernetes Networking Maze: A Deep Dive for 2016

Let's be honest: Kubernetes is eating the world, but it might be chewing up your network throughput in the process. With the recent release of Kubernetes 1.3, we finally have PetSets for stateful apps and better scalability, but the networking model remains the single biggest hurdle for anyone migrating from a monolithic LAMP stack to microservices. I've spent the last month debugging a cluster that kept dropping packets between nodes, only to realize the overlay network was choking on the CPU interrupts.

Docker 1.12 just dropped with Swarm Mode baked in, trying to make orchestration look "easy." But we know "easy" usually means "hard to debug later." If you are serious about production workloads, you are likely sticking with Kubernetes. But to keep it running, you need to understand what happens to a packet when it leaves a Pod.

The Fundamental Challenge: The Flat Network

Kubernetes imposes a strict requirement: every Pod must be able to communicate with every other Pod without Network Address Translation (NAT). In a local development environment using Minikube, this is trivial. Across a distributed cluster of Virtual Private Servers (VPS), it’s a nightmare.

If you just spin up Docker on two different servers, the containers on Server A have no idea how to route to containers on Server B. They might even have overlapping IP subnets (the dreaded 172.17.0.0/16 default). To fix this, we need a CNI (Container Network Interface) plugin.

The CNI Showdown: Flannel vs. Calico

Right now, in mid-2016, you effectively have two main choices for your networking backend if you are running on self-managed infrastructure like CoolVDS.

Feature Flannel (VXLAN) Calico (BGP)
Mechanism Encapsulation (Overlay) Layer 3 Routing
Performance Medium (Packet overhead) High (Near metal speed)
Complexity Low (Just works) High (Requires BGP knowledge)
Datastore etcd etcd

For 90% of use cases, I see teams defaulting to Flannel with VXLAN. It wraps your Layer 3 packets inside UDP packets to cross the physical network. It's simple, but it comes with a tax: MTU issues and CPU overhead for encapsulation/decapsulation.

Pro Tip: If you are using Flannel on CoolVDS, check your interface MTU. The standard Ethernet MTU is 1500. VXLAN adds a 50-byte header. If you don't lower the MTU inside your containers to 1450, you will see random connection drops on large payloads (like MySQL query results) while pings work fine.

Under the Hood: The `iptables` Spaghetti

When you create a Service in Kubernetes, it gets a virtual IP (ClusterIP). But that IP doesn't exist on any interface. It's a lie maintained by iptables.

The kube-proxy component watches the API server and updates the firewall rules on every node. In Kubernetes 1.0/1.1, we had a userspace proxy that was painfully slow. Now in 1.3, the iptables mode is the default and much faster, but it makes debugging hell.

If you run a high-traffic service, your nat table looks like this:

# iptables -t nat -L KUBE-SERVICES -n | head -n 10
Chain KUBE-SERVICES (2 references)
target     prot opt source               destination         
KUBE-SVC-X6W...  tcp  --  0.0.0.0/0            10.100.0.1           /* kubernetes default service cluster ip */ tcp dpt:443
KUBE-SVC-NPX...  tcp  --  0.0.0.0/0            10.100.0.10          /* kube-dns cluster ip */ tcp dpt:53
...

Every time a packet hits a Service IP, iptables uses the statistic module to randomly select a backend Pod (Random Balancing). This happens in kernel space, which is fast, but if you have 5,000 services, the sequential rule evaluation starts to introduce latency.

Why Virtualization Choice Impacts K8s Performance

This is where your infrastructure provider makes or breaks you. I've seen dev teams try to run Kubernetes on OpenVZ containers. Don't do it.

Kubernetes needs to manipulate kernel modules, mount overlays, and control iptables extensively. Shared-kernel virtualization (like OpenVZ or LXC) often restricts these capabilities or shares the connection tracking table (conntrack) across all tenants. If your noisy neighbor gets DDoS'd, the conntrack table fills up, and your Kubernetes cluster stops accepting new connections.

At CoolVDS, we exclusively use KVM (Kernel-based Virtual Machine). Each VPS has its own kernel. You can load your own modules. More importantly, the network drivers use virtio, allowing for near-native I/O performance.

If you are deploying a DaemonSet for networking, you need access to the host network namespace. Here is how we configure Calico nodes in production on KVM instances:

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: calico-node
  namespace: kube-system
spec:
  template:
    metadata:
      labels:
        k8s-app: calico-node
    spec:
      hostNetwork: true
      containers:
        - name: calico-node
          image: calico/node:v0.20.0
          securityContext:
            privileged: true
          env:
            - name: ETCD_ENDPOINTS
              value: "http://10.96.0.232:2379"

The Nordic Context: Data Sovereignty & Latency

We are operating in a post-Safe Harbor world. The EU-US Privacy Shield was adopted last month, but uncertainty remains. For Norwegian businesses, storing data outside the EEA is becoming a legal headache. Datatilsynet (The Norwegian Data Protection Authority) is watching closely.

Hosting your Kubernetes cluster on US-based clouds introduces legal risk and, more pragmatically, latency. Light takes time to travel. A round trip from Oslo to Ashburn, Virginia is ~90ms. From Oslo to a local CoolVDS datacenter? <5ms.

When you are running a microservices architecture where one user request might spawn 20 internal RPC calls between Pods, that latency compounds. 5ms vs 90ms isn't just network speed; it's the difference between a snappy UI and a frustrated customer.

Deploying a Test Cluster

If you want to verify the throughput difference yourself, grab iperf. Spin up two CoolVDS instances (our NVMe tier is recommended for etcd performance) and run a benchmark.

# On Node A (Server)
iperf -s

# On Node B (Client)
iperf -c <Node-A-IP> -t 30

You will see that dedicated KVM resources hold up under load where oversold clouds falter. Kubernetes is complex enough without fighting your infrastructure. Build on solid ground.

Ready to architect a cluster that actually performs? Deploy a KVM-based instance on CoolVDS in under 55 seconds and keep your data safely in Norway.