Kubernetes 1.5 Networking Deep Dive: Surviving the Overlay Jungle
It works on your laptop. Of course it does. `minikube` is forgiving. But move that cluster to a staging environment spanning three different nodes, and suddenly your microservices aren't talking. They're screaming into the void. Welcome to the reality of Kubernetes networking in 2017.
I’ve spent the last month debugging a distributed Cassandra cluster on Kubernetes where read latencies were spiking randomly. The culprit wasn't Java garbage collection. It wasn't disk I/O. It was a misconfigured overlay network choking on packet encapsulation overhead on a budget VPS provider.
If you are serious about running containers in production, you need to stop treating the network like a black box. Today, we rip open the hood of the Container Network Interface (CNI), `kube-proxy`, and `iptables`. We are going to look at why physical infrastructure latency—specifically here in the Nordics—is the invisible wall you keep hitting.
The "Flat Network" Lie
Kubernetes imposes a strict requirement: Every Pod must have a unique IP address, and all Pods must be able to communicate with all other Pods without NAT (Network Address Translation). It sounds simple, like a flat LAN. But underneath, unless you own the switch fabric, you are likely relying on an overlay network.
In a typical setup using Flannel (a popular CNI choice right now), your traffic is being encapsulated in UDP packets (VXLAN) to traverse the host network. This adds CPU overhead. Every packet is wrapped, shipped, unwrapped, and delivered.
Here is what a standard Flannel subnet configuration looks like in `etcd`. If you aren't checking this, you're flying blind:
{
"Network": "10.244.0.0/16",
"SubnetLen": 24,
"Backend": {
"Type": "vxlan",
"VNI": 1
}
}
If your underlying host has "noisy neighbors" stealing CPU cycles, that VXLAN encapsulation slows down. Your database connection times out. Your app crashes. This is why we deploy on CoolVDS KVM instances. We need guaranteed CPU cycles to handle the network stack processing without jitter.
Kube-Proxy and the IPTables Maze
Since Kubernetes 1.2, `iptables` mode has been the default for `kube-proxy`. This was a massive upgrade from the old userspace mode, which was a performance nightmare. However, it means debugging services requires reading `iptables` rules. And there are thousands of them.
When you create a Service in Kubernetes, `kube-proxy` writes rules to intercept traffic destined for the Service IP and redirects it to one of the backing Pods.
Let's look at what happens on a node when you wonder why a Service isn't reachable. Run this:
sudo iptables-save | grep KUBE-SVC
You will see a chain reference. Follow it. For a Service with two endpoints (Pods), the probability distribution rules look like this:
-A KUBE-SVC-XYZ -m comment --comment "default/my-service:" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-ABC
-A KUBE-SVC-XYZ -m comment --comment "default/my-service:" -j KUBE-SEP-DEF
Pro Tip: If you see thousands of rules, rule evaluation time increases. While Linux is efficient, latency adds up. This is where kernel tuning becomes mandatory. On our high-performance nodes, we tweak `sysctl.conf` to handle higher connection tracking limits, or we risk dropping packets silently.
Tuning Sysctl for K8s Networking
Don't deploy a cluster without applying these settings on your worker nodes. The defaults on most Linux distributions (CentOS 7, Ubuntu 16.04) are too conservative for container traffic.
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
net.netfilter.nf_conntrack_max = 131072
net.ipv4.tcp_tw_reuse = 1
Note: The `bridge-nf-call-iptables` setting is critical. Without it, bridged traffic might bypass `iptables` entirely, breaking your Service discovery and security policies.
CNI Showdown: Flannel vs. Calico
In early 2017, the debate is usually between Flannel and Calico.
| Feature | Flannel (VXLAN) | Calico (BGP) |
|---|---|---|
| Complexity | Low (Easy setup) | High (Requires BGP knowledge) |
| Performance | Medium (Encap overhead) | High (Native routing) |
| Network Policies | No (Open network) | Yes (Security constraints) |
We often recommend Calico for enterprise deployments where you need to lock down traffic between namespaces. However, Calico requires a stable Layer 2 network to establish BGP peering effectively. If your VPS provider filters MAC addresses aggressively or blocks unknown protocols, Calico breaks.
This is a common pain point with budget clouds. At CoolVDS, we configure our KVM virtualization to be transparent enough to allow these protocols to function correctly. We don't artificially restrict your network capabilities.
The Norwegian Latency Advantage
Let's talk geography. If your users are in Oslo or Bergen, but your Kubernetes nodes are in a datacenter in Frankfurt or Amsterdam, you are adding 20-30ms of round-trip time (RTT) before the request even hits your ingress controller. Add the overlay network overhead (~2ms), the `iptables` processing, and the application logic, and your snappy app feels sluggish.
Data sovereignty is also looming. With the GDPR enforcement date approaching next year (2018), keeping data within national borders is becoming a priority for Datatilsynet (The Norwegian Data Protection Authority). Hosting on local infrastructure isn't just about speed anymore; it's about compliance strategy.
By peering directly at NIX (Norwegian Internet Exchange), CoolVDS ensures that local traffic stays local. Low latency isn't a luxury; in a microservices architecture where one user request spawns 50 internal RPC calls, latency is the killer of scalability.
Debugging DNS (Because it's always DNS)
In Kubernetes 1.5, we rely on `kube-dns`. It’s a combination of `skydns`, `kube2sky`, and `etcd` (or a sidecar). When Services fail to resolve, check the logs of the `kube-dns` pod in the `kube-system` namespace immediately.
kubectl logs --namespace=kube-system $(kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o name) -c kubedns
If you see timeouts here, your UDP packets are likely being dropped. Check the `mtu` (Maximum Transmission Unit). If your host interface MTU is 1500 and Flannel adds a 50-byte header, but the inner container interface is also set to 1500, packets will be fragmented or dropped. Ensure your CNI MTU is smaller than the Host MTU (e.g., set Flannel to 1450).
Conclusion
Kubernetes networking is powerful, but it assumes a robust underlying infrastructure. You cannot build a skyscraper on a swamp. If you are struggling with intermittent connection resets or high latency between pods, stop blaming your code.
Look at your `iptables`. Look at your CPU steal time. Look at your provider.
We built CoolVDS to be the foundation where these abstractions actually work. With NVMe storage to keep `etcd` happy and unthrottled network IO for your overlays, you can stop fighting the infrastructure and start shipping code.
Ready to stabilize your cluster? Spin up a CoolVDS KVM instance in Oslo today and see the difference 2ms latency makes.