Kubernetes Networking Deep Dive: Surviving the Packet Flow Hell
Let’s be honest: for 90% of the people deploying Kubernetes right now, the network is just magic. You apply a YAML file, a Service gets an IP, and traffic flows. Until it doesn't. Until you hit a 500ms latency spike between Pods because your provider's virtual switch is choking on VXLAN encapsulation, or your iptables rules have grown so long that the kernel spends more time traversing chains than actually moving packets.
I recently audited a cluster for a fintech startup in Oslo. They were complaining about "microservice timeouts." It wasn't the code. It was the underlying infrastructure buckling under the weight of packet encapsulation on cheap, oversold VPS hosts.
If you are building infrastructure in 2019, you cannot treat Kubernetes networking as an abstraction. You need to understand the Linux plumbing underneath. Today, we are going deep into CNI choices, the iptables vs. IPVS debate, and why your hosting provider's "noisy neighbors" are killing your throughput.
The CNI Dilemma: VXLAN vs. BGP
The Container Network Interface (CNI) is where the rubber meets the road. In the current landscape, you essentially have two architectural choices: Encapsulation (Overlay) or Direct Routing.
1. The Easy Way: VXLAN (Flannel/Weave)
Most tutorials tell you to install Flannel. It builds an overlay network using VXLAN. Every packet leaving a pod is encapsulated in a UDP packet, sent across the host network, and decapsulated on the other side. It works everywhere, but it comes with a CPU tax.
2. The Performance Way: BGP (Calico)
If you care about latency—and if you are hosting real-time applications in Norway, you should—you want flat, routed networking. Calico allows you to use BGP to distribute routes. No encapsulation. Just pure IP routing.
However, BGP requires a network that allows it. Many budget VPS providers block BGP or filter unknown MAC addresses aggressively. This is why we deploy on CoolVDS KVM instances; they act like proper bare metal, allowing us to manage our routing tables without the hypervisor playing nanny.
Pro Tip: If you must use VXLAN, ensure your MTU is configured correctly. Standard Ethernet is 1500 bytes. VXLAN adds 50 bytes of overhead. If your CNI tries to push 1500 bytes inside a VXLAN packet, you get fragmentation. Set your inner MTU to 1450.
The Bottleneck: Why you should ditch iptables for IPVS
As of Kubernetes 1.11 (and stabilized in our current 1.13 builds), kube-proxy supports IPVS (IP Virtual Server). This is critical.
By default, kube-proxy implements Services by writing iptables rules. If you have 1,000 services, the kernel has to process a sequential list of rules for every packet. It is O(n) complexity. I've seen connection establishment latency jump to 50ms just because of iptables overhead.
IPVS is based on hash tables. It provides O(1) complexity. It doesn't matter if you have 10 services or 10,000; the lookup time is constant.
How to enable IPVS in 2019
If you are spinning up a cluster using kubeadm, you need to explicitly tell it to use IPVS. Do not leave this on default.
# Create a configuration file for kube-proxy
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs"
ipvs:
excludeCIDRs: null
minSyncPeriod: 0s
scheduler: "rr" # Round Robin is usually fine, but 'lc' (least connection) is better for long sessions
strictARP: false
syncPeriod: 30s
Before applying this, make sure the IPVS kernel modules are loaded on your CoolVDS nodes:
# Load required modules
modprobe -- ip_vs
modprobe -- ip_vs_rr
modprobe -- ip_vs_wrr
modprobe -- ip_vs_sh
modprobe -- nf_conntrack_ipv4
Debugging Network Latency on the Node
When a developer claims "the network is slow," don't guess. Measure. One of the most powerful tools in our arsenal is nsenter. It allows you to step into the network namespace of a specific container from the host command line.
First, find the Process ID (PID) of the container:
docker inspect -f '{{.State.Pid}}' k8s_my-pod_name_...
Let's say the PID is 12345. Now, enter its network namespace and run tcpdump exactly as the container sees it:
nsenter -t 12345 -n tcpdump -i eth0 -w /tmp/capture.pcap
Analyze that PCAP. Are you seeing retransmissions? That usually points to a noisy neighbor on your host stealing CPU cycles, causing the virtual NIC to drop packets. This is a plague on shared hosting.
The Storage Network: etcd needs NVMe
Kubernetes networking relies entirely on the state stored in etcd. If etcd is slow, your network updates (like new Service endpoints) propagate slowly. etcd is incredibly sensitive to disk write latency. It calls fsync constantly to ensure consistency.
If you run etcd on a standard HDD or a cheap SATA SSD shared with 50 other tenants, your cluster will become unstable. The "heartbeat" interval in Kubernetes is aggressive.
We benchmarked this. On standard cloud storage, etcd commit latencies often spike above 100ms. On CoolVDS NVMe instances, we consistently see commit latencies under 2ms. This difference is why our clusters survive traffic spikes while others enter a crash loop.
Optimizing etcd for unstable networks
If you are forced to run on slower hardware, you can tune the heartbeat, but it's a band-aid:
# /etc/kubernetes/manifests/etcd.yaml
command:
- etcd
- --heartbeat-interval=250
- --election-timeout=2500
GDPR & Data Residency: The Norwegian Context
Since the implementation of GDPR last year, keeping data flow controlled is mandatory. You cannot have traffic accidentally routing through a US-based load balancer if the data is sensitive.
Use Kubernetes NetworkPolicies to enforce a default-deny stance. This ensures that even if a pod is compromised, it cannot exfiltrate data to the internet unless explicitly allowed.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: sensitive-apps
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
Furthermore, hosting on CoolVDS ensures your physical bits stay in Oslo. With direct peering to NIX (Norwegian Internet Exchange), your latency to local users is minimal, and you satisfy Datatilsynet requirements regarding data residency.
Conclusion: It's All About the Foundation
Kubernetes is powerful, but it introduces massive complexity. Overlays on top of overlays. To run a stable cluster in production in 2019, you need to strip away the uncertainty.
Use IPVS to bypass the iptables swamp. Use Calico for cleaner routing. And most importantly, run your nodes on infrastructure that respects your need for raw I/O and consistent CPU performance.
Don't let Packet Per Second (PPS) limits on budget hosting throttle your success. Deploy a high-performance NVMe KVM instance on CoolVDS today and give your Kubernetes cluster the foundation it deserves.