Kubernetes Networking Deep Dive: Moving Beyond iptables Hell in Production
It is 3:00 AM. Your monitoring dashboard is bleeding red. The API server is timing out, and half your pods are in a CrashLoopBackOff state. You check the logs, and it's the vague, infuriating silence of a network timeout. If you have been running Kubernetes in production since the v1.5 days, you know this pain. Networking is the soft underbelly of the container orchestration beast.
By now, in early 2019, Kubernetes v1.13 has stabilized many of the rough edges. We aren't manually stitching Docker bridges together like it's 2015 anymore. Yet, complexity has merely shifted layers. We have traded manual routing for CNI plugins, overlay networks, and a tangle of iptables rules that would make a kernel developer weep. If you are deploying clusters for Norwegian enterprises, you are also juggling the strict latency requirements of NIX (Norwegian Internet Exchange) connectivity and the absolute mandate of GDPR compliance.
This isn't a "Hello World" tutorial. This is a look at why your packet throughput sucks and how to fix it using the latest stable features available today.
The Hidden Tax of Overlay Networks
Most default K8s installations shove you toward a VXLAN overlay. It works out of the box. It is also a performance vampire. Every packet leaving a pod gets encapsulated, adding CPU overhead and reducing the MTU (Maximum Transmission Unit). In a high-throughput environmentâsay, a video streaming backend serving Oslo or a fintech APIâthat overhead compounds.
If you are running on bare metal or a high-performance VPS like CoolVDS, you should consider routing directly. However, if you must use an overlay, understanding the CNI (Container Network Interface) choice is critical. Flannel is simple but simplistic. Calico is the heavy hitter we prefer for production workloads requiring network policies.
Configuring Calico for Performance
Don't just apply the default YAML. For a setup handling significant traffic, you need to tune the MTU to match your underlying host interface. If your host allows Jumbo Frames (MTU 9000), your CNI must know about it.
# calico-config.yaml excerpt
kind: ConfigMap
apiVersion: v1
metadata:
name: calico-config
namespace: kube-system
data:
veth_mtu: "8950" # Leave room for encapsulation headers if using IP-in-IP
typha_service_name: "none"
calico_backend: "bird"
Pro Tip: Always check your node-to-node latency before blaming Kubernetes. Use iperf3 between your worker nodes. If the underlying network is jittery, no amount of CNI tuning will save you. At CoolVDS, we map our KVM instances to physical cores to prevent the "noisy neighbor" CPU steal that kills network throughput.
The Iptables Bottleneck vs. IPVS
Until recently, kube-proxy relied almost exclusively on iptables to handle Service discovery and load balancing. Here is the math problem: iptables rule processing is sequentialâO(N). When you have 5,000 services in a cluster, the kernel has to traverse a massive list of rules for every packet.
I have seen clusters freeze simply because the iptables restore process took too long, causing the API server to think the node was dead. The solution in Kubernetes 1.11+ (and fully stable now in 1.13) is IPVS (IP Virtual Server). IPVS uses hash tablesâO(1). It doesn't care if you have 10 services or 10,000.
To enable this, you need to ensure your underlying Linux kernel modules are loaded (specifically ip_vs, ip_vs_rr, ip_vs_wrr, ip_vs_sh) and modify your kube-proxy config.
Enabling IPVS Mode
First, load the modules on your worker nodes:
modprobe ip_vs
modprobe ip_vs_rr
modprobe ip_vs_wrr
modprobe ip_vs_sh
modprobe nf_conntrack_ipv4
Then, update the kube-proxy ConfigMap:
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs"
ipvs:
scheduler: "rr" # Round Robin
strictARP: true
If you are hosting on older infrastructure, check your kernel version. You need at least Linux 4.19 to get the best out of these features without bugs. This is why we keep the CoolVDS kernel templates rigorously updated.
DNS Latency: The Silent Killer
As of Kubernetes 1.13, CoreDNS is the default DNS server, replacing kube-dns. This is a massive improvement, but it is not magic. In Norway, we often deal with applications that make extensive external API calls (payment gateways like Vipps, data providers, etc.).
A common issue is the ndots:5 configuration in /etc/resolv.conf inside pods. This forces the resolver to try five different local domain permutations before hitting the actual external domain. This can double or triple the latency for every external request.
If your application talks mostly to the outside world, lower the ndots configuration in your Pod spec:
spec:
dnsConfig:
options:
- name: ndots
value: "2"
The Storage-Network Nexus: Etcd
You might ask, "Why is a networking article talking about storage?" Because Kubernetes networking state lives in etcd. If etcd is slow, your service updates are slow. If etcd crashes, your cluster is brain-dead.
Etcd relies heavily on fsync calls to ensure data durability. If your VPS provider is putting you on standard spinning rust or oversold SSDs, the disk latency (wal_fsync_duration_seconds) will spike. When that happens, the leader election fails, and network updates stop propagating.
| Metric | Standard SSD VPS | CoolVDS NVMe |
|---|---|---|
| Random IOPS (4k) | ~5,000 - 10,000 | ~80,000+ |
| Disk Latency | 2-5ms | < 0.5ms |
| Etcd Stability | Risky at scale | Rock Solid |
We mandate NVMe storage for any etcd node. It is not an upsell; it is a technical requirement for a stable control plane.
Local Nuances: Latency and Law
For those of us operating out of Oslo or dealing with the Datatilsynet, location matters. Routing traffic through Frankfurt or London adds 20-30ms of latency. It also introduces potential data residency headaches, especially with the uncertainty surrounding the Privacy Shield framework and the aggressive reach of the US CLOUD Act.
Hosting locally in Norway isn't just about nationalism; it is about physics and compliance. Keeping the packet travel distance short keeps your application snappy. CoolVDS infrastructure is physically located to optimize routing to the major Nordic internet exchanges, ensuring that your local traffic stays local.
Final Thoughts
Kubernetes in 2019 is powerful, but it assumes your underlying infrastructure is competent. You can tune sysctl flags and optimize CNI configs all day, but if the hypervisor is stealing your CPU cycles or the disk is choking on I/O, you will never achieve stability.
Don't let legacy hosting architectures be the bottleneck for your modern stack. If you are ready to stop debugging network timeouts and start shipping code, test your cluster on infrastructure built for this exact workload.
Deploy a high-performance NVMe instance on CoolVDS today and see what sub-millisecond latency actually feels like.