Demystifying Kubernetes Networking: From Iptables Hell to CNI Bliss
It is 3:00 AM. Your pager is screaming. The API is returning 504 Gateway Timeouts, but only intermittently. You check the Pod status: Running. You check the logs: Connection timed out. Welcome to the invisible warzone of Kubernetes networking.
Most sysadmins treat Kubernetes networking as a black box. You apply a CNI (Container Network Interface) plugin, hope for the best, and pray you never have to debug iptables rules generated by kube-proxy. But in 2019, with production workloads shifting rapidly to k8s v1.15, ignorance is no longer an excuse. If you don't understand how a packet moves from Pod A to Pod B, you aren't managing a cluster; you are just watching it burn.
In this deep dive, we are going to strip away the abstraction layers. We will look at packet encapsulation, why iptables is becoming a bottleneck, and why your choice of infrastructure—specifically, the underlying network latency and IOPS provided by your VPS host—matters more than your YAML configuration.
The Hidden Cost of Overlay Networks
In a standard Kubernetes setup, every Pod gets its own IP address. To achieve this across multiple nodes without complex routing protocols (like BGP) at the switch level, we often use overlay networks. Tools like Flannel typically default to VXLAN encapsulation.
Here is the reality: Encapsulation costs CPU cycles. Every packet leaving a Pod is wrapped in a UDP packet, sent over the wire, and unwrapped on the destination node. If your underlying VPS infrastructure has "noisy neighbors" or CPU stealing (stolen time), your network throughput tanks. This is physics.
Pro Tip: Always check your MTU (Maximum Transmission Unit). A common issue in Norway when routing through various ISPs is packet fragmentation. If your physical interface MTU is 1500, your VXLAN overlay must be lower (usually 1450) to account for the header. If you miss this, you get silent packet drops.
The Battle: Iptables vs. IPVS
Historically, kube-proxy used iptables to handle Service discovery and load balancing. When a Service VIP (Virtual IP) is hit, iptables rules redirect that packet to a specific Pod IP.
This works fine for 50 services. But what happens when you have 5,000 services? Iptables is a linear list. Evaluating rules becomes O(n). We have seen clusters choke simply because the kernel spent too much time traversing huge iptables chains.
The solution in 2019 is IPVS (IP Virtual Server). It is based on hash tables, making lookups O(1) constant time. If you are running high-traffic workloads, you need to enable IPVS mode in kube-proxy.
Configuration: Enabling IPVS
First, ensure the modules are loaded on your host (or CoolVDS instance):
modprobe ip_vs
modprobe ip_vs_rr
modprobe ip_vs_wrr
modprobe ip_vs_sh
modprobe nf_conntrack_ipv4
Then, update your kube-proxy config map. If you are using kubeadm, you can edit the config directly:
kubectl edit configmap kube-proxy -n kube-system
Look for the mode setting and change it:
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs"
ipvs:
scheduler: "rr"
strictARP: true
After a restart of the kube-proxy pods, your networking performance will stabilize significantly under load.
CNI Selection: Flannel vs. Calico
We often get asked by clients moving their infrastructure to our Oslo datacenter: "Which CNI should I use?"
| Feature | Flannel | Calico |
|---|---|---|
| Model | VXLAN (Overlay) | BGP (Layer 3 Routing) |
| Complexity | Low (Install & Go) | Medium/High |
| Network Policies | No (Open traffic) | Yes (Secure traffic) |
| Performance | Good | Excellent (Near metal) |
For simple dev environments, Flannel is sufficient. However, for production environments requiring security compliance—especially with strict Norwegian data standards—Calico is the superior choice. It allows you to define NetworkPolicies to restrict traffic between Pods. Flannel is wide open by default.
Implementing a Default Deny Policy
Security starts with Zero Trust. Here is how you lock down a namespace using Calico:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny
namespace: production
spec:
podSelector: {}
policyTypes:
- Ingress
Applying this means no pod in the production namespace can accept traffic unless explicitly allowed. This is crucial for GDPR compliance, ensuring that data does not flow to unauthorized microservices.
The Latency Factor: Geography Matters
You can optimize your CNI and kernel settings all day, but you cannot overcome the speed of light. If your users are in Norway and your Kubernetes cluster is hosted in a budget datacenter in Arizona, your latency will be 150ms+. For a microservices architecture where one user request triggers 10 internal RPC calls, that latency compounds.
Local peering is critical. At CoolVDS, we peer directly at NIX (Norwegian Internet Exchange). When a user in Oslo requests data from your app, the packet often never leaves the country. This results in standard ping times of 2-5ms within Norway.
Debugging Network Issues
When things break, kubectl describe isn't enough. You need to get onto the node.
1. Check the Routing Table
Inside the node, ensure the routes to the Pod CIDR ranges exist:
ip route show
2. Trace the Packet
Use tcpdump to see if packets are arriving at the interface. If you are using VXLAN (Flannel), listen on the flannel.1 interface:
tcpdump -i flannel.1 -nn port 80
3. DNS Debugging
Since Kubernetes 1.11, CoreDNS is the standard. If Pods can't resolve service names, check the logs:
kubectl logs -l k8s-app=kube-dns -n kube-system
Why Infrastructure Choice is not Just About CPU
Kubernetes is I/O hungry. Etcd (the brain of K8s) requires extremely low disk latency to maintain cluster consensus. If your VPS provider uses standard HDD or shared SATA SSDs, your Etcd cluster will become unstable during high write operations.
This is why we standardized on NVMe storage for all CoolVDS instances. In a distributed system, disk latency eventually becomes network latency. If Etcd waits for disk, the API server waits for Etcd, and your deployment hangs.
Conclusion
Kubernetes networking in 2019 is complex, but it is deterministic. By moving from iptables to IPVS, selecting the right CNI for your security needs, and ensuring your underlying infrastructure provides the low latency and high I/O required, you can build a platform that is boringly stable.
Don't let slow I/O or geographic latency kill your architecture. Deploy a test cluster on CoolVDS today—our NVMe-backed instances in Oslo are ready for your workloads.