Untangling the Mesh: A Real-World Look at Kubernetes 1.1 Networking
Everyone is talking about Google's Kubernetes right now. It hit version 1.0 last summer, and with the recent 1.1 release, it's finally looking stable enough for something other than a chaotic dev environment. But let's be honest: while the scheduling is brilliant, the networking model is absolutely unforgiving. If you are coming from a traditional Docker setup where you map ports manually (-p 8080:80), Kubernetes will feel like a bucket of cold water to the face.
I've spent the last three weeks debugging a multi-node cluster for a client in Oslo. The symptoms? Intermittent timeouts between services and random Connection Refused errors that seemed to depend on which node the pod landed on. It wasn't the app. It was the overlay network choking on a cheap, over-provisioned hypervisor.
Here is the reality of running K8s in production today, and how to stop your packets from disappearing into the void.
The "Flat Network" Mandate
Kubernetes imposes a strict requirement: every Pod must be able to communicate with every other Pod without Network Address Translation (NAT).
On your laptop, this is easy. All containers share the same Docker bridge. But spread those containers across three different VPS nodes, and suddenly you have a routing nightmare. You cannot have overlapping IP ranges. If Node A assigns 172.17.0.2 to a container, and Node B assigns the same IP to another, your routing table is trash.
We solve this with overlay networks. But overlays come with a tax.
The Contenders: Flannel vs. Weave
In the current ecosystem (January 2016), you essentially have two robust choices for networking plugins if you aren't rolling your own BGP setup.
1. Flannel (CoreOS)
Flannel is the pragmatic choice. It creates a simple overlay network on top of the host network. It assigns a subnet to each host (e.g., Node A gets 10.1.15.0/24, Node B gets 10.1.16.0/24) and stores these mappings in Etcd.
The backend matters. By default, Flannel uses UDP encapsulation. It works everywhere, but it's slow. If your kernel supports it (and you should be running a modern kernel like 4.2+), vxlan is significantly more performant.
2. Weave Net
Weave is interesting because it doesn't require a distributed key-value store like Etcd to function; it gossips. It also offers encryption out of the box. However, that encryption burns CPU cycles. If you are running on a budget VPS with "shared" vCPUs, Weave's encryption overhead can introduce latency spikes that will kill your database connections.
Configuration: Setting up Flannel correctly
Do not just run the default script and hope for the best. You need to configure the subnet allocation in Etcd before starting the flanneld service.
Here is how we configure the network config in Etcd for a 10.1.0.0/16 cluster network:
etcdctl mk /coreos.com/network/config '{"Network":"10.1.0.0/16", "Backend": {"Type": "vxlan"}}'Note the vxlan backend. Do not use UDP unless you are blocked by a firewall that drops VXLAN packets (rare, but happens in restrictive corporate environments). Once Flannel is running, it modifies your Docker configuration to pass the specific subnet.
On a CentOS 7 node, check your /run/flannel/subnet.env. It should look like this:
FLANNEL_NETWORK=10.1.0.0/16 FLANNEL_SUBNET=10.1.15.1/24 FLANNEL_MTU=1450 FLANNEL_IPMASQ=true
Pay attention to that MTU. The standard internet MTU is 1500. VXLAN adds headers, so the payload must be smaller (1450 bytes). If your underlying host network (the VPS provider's switch) drops fragmented packets or has a strict MTU policy, your large packets (like JSON responses or database syncs) will be dropped silently. This is the #1 cause of "it works for curl but fails for the app."
The Performance "Tax" of Virtualization
This is where your choice of hosting provider becomes critical. Kubernetes is already adding a layer of abstraction. Docker adds another. The overlay network adds a third.
If your VPS provider is using OpenVZ or heavily overselling their CPU, the context switching required to encapsulate and decapsulate every single packet traveling between nodes will destroy your throughput. I have seen latency jump from 0.5ms to 15ms just because of noisy neighbors on a shared host.
For the project in Oslo, we migrated the nodes to CoolVDS. Why? Because KVM virtualization exposes the necessary kernel modules for VXLAN hardware offloading effectively, and the dedicated resource allocation means our CPU isn't fighting for time to process network interrupts.
Pro Tip: Check your latency to the Norwegian Internet Exchange (NIX). If your servers are hosted in Germany or the US, you are adding 30-100ms of round-trip time before your packet even hits the overlay network. For Norwegian users, keep your nodes in Oslo.
Debugging kube-proxy and iptables
In Kubernetes 1.1, kube-proxy still defaults to userspace proxying, but the iptables mode is available and much faster. It avoids the context switch between kernel and userspace for every packet.
To check if your services are actually routing, don't just look at logs. Look at the routing table. A functional Service VIP (Virtual IP) should show up in your NAT table:
iptables -t nat -L KUBE-SERVICES -n | grep 10.0.0.15If you see the rule there, but traffic isn't passing, verify that IP forwarding is actually enabled on your host. It sounds stupid, but I've seen it disabled by default on many "hardened" OS images.
sysctl -w net.ipv4.ip_forward=1Data Sovereignty and The "Post-Safe Harbor" World
A quick note for the CTOs reading this. Since the European Court of Justice invalidated Safe Harbor last October, moving data between Norwegian users and US-hosted servers is legally risky. Datatilsynet (The Norwegian Data Protection Authority) is watching this space closely.
When you build your Kubernetes cluster, ensure your persistent volumes (PVs) are bound to nodes physically located in Norway or the EEA. Don't let your dynamic provisioner accidentally spin up storage in a US-East zone just because it was the default in your config file.
Summary
Kubernetes 1.1 is powerful, but it assumes your network is robust. Overlays like Flannel/VXLAN work well, but they require CPU power and stable MTU handling. Don't cheap out on the compute layer, or you will spend your weekends debugging dropped packets.
If you need a KVM environment that respects high I/O and low latency requirements for your cluster nodes, spin up a CoolVDS instance. The NVMe storage helps with Docker image pulls, and the network stability is exactly what Etcd needs to keep your cluster quorum healthy.