Kubernetes Networking Deep Dive: Surviving the Overlay Chaos
If you think Kubernetes networking is magic, check your iptables rules on a worker node. Go ahead, run iptables-save | wc -l. If you are running a standard kube-proxy setup with a few dozen services, that number is probably terrifying. It’s not magic; it’s a precarious house of cards built on legacy Linux kernel primitives.
As a Systems Architect deploying clusters across the Nordics, I see the same story repeatedly. A team spins up a cluster using default settings. It works fine in dev. Then they hit production traffic, and suddenly DNS latency spikes, connection timeouts occur randomly, and the “network is slow” ticket lands in my queue.
In 2021, we have better tools. We have eBPF. We have IPVS. But none of that matters if your underlying infrastructure is fighting you. This is a deep dive into making packets flow efficiently in a Kubernetes environment, specifically for those of us running on bare metal or high-performance VPS in Europe, where latency to the NIX (Norwegian Internet Exchange) actually matters.
The CNI Battlefield: Flannel vs. Calico vs. Cilium
The Container Network Interface (CNI) is where the rubber meets the road. If you are still using Flannel (VXLAN) in 2021 for high-throughput production, stop. It’s simple, but the encapsulation overhead is a tax you don’t need to pay.
For a recent project involving financial data aggregation in Oslo, we benchmarked the three heavyweights. Here is the reality:
| CNI Plugin | Technology | Best For | The Catch |
|---|---|---|---|
| Flannel | VXLAN (Encapsulation) | Simple labs, low traffic. | High CPU overhead per packet. No network policies. |
| Calico | BGP / IPIP | Standard Production. | BGP complexity at scale. IPIP still has overhead. |
| Cilium | eBPF | High Performance / Security. | Kernel version dependency (requires 4.19+). |
My recommendation? If your kernel supports it, move to Cilium. It bypasses the iptables mess entirely by using eBPF (Extended Berkeley Packet Filter). It allows the kernel to process packets at the socket layer without traversing the entire network stack.
Implementing IPVS Mode
If you are stuck with kube-proxy on standard iptables mode, the kernel processes rules sequentially. O(n) complexity. When you have 5,000 services, that lookup time kills your latency. IPVS (IP Virtual Server) uses a hash table. O(1) complexity. It is instant.
Here is how you force kube-proxy to use IPVS in your cluster configuration (ConfigMap):
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs"
ipvs:
strictARP: true
tcpTimeout: 0s
tcpFinTimeout: 0s
udpTimeout: 0sNote: You must ensure the IPVS kernel modules are loaded on your underlying VPS nodes before applying this. On a CoolVDS instance running Ubuntu 20.04, you would prep the node like this:
modules="ip_vs ip_vs_rr ip_vs_wrr ip_vs_sh nf_conntrack_ipv4"
for mod in $modules; do
sudo modprobe $mod
doneIngress: The Bottleneck You Forgot
You have optimized pod-to-pod communication. Great. Now, how does traffic get in? If you are hosting on AWS, you just request a LoadBalancer and pay the tax. But for those of us leveraging cost-effective European VPS infrastructure, we often use NGINX Ingress Controller paired with MetalLB.
The default NGINX config is designed for compatibility, not speed. I've seen massive throughput gains by tuning the open file limits and buffer sizes in the NGINX ConfigMap. Without this, your high-performance NVMe storage is wasted because the network gateway is choking.
kind: ConfigMap
apiVersion: v1
metadata:
name: nginx-configuration
namespace: ingress-nginx
labels:
app.kubernetes.io/name: ingress-nginx
data:
worker-processes: "auto"
worker-connections: "10240"
keep-alive: "60"
upstream-keepalive-connections: "100"
compute-full-forwarded-for: "true"
use-forwarded-headers: "true"Pro Tip: In Norway, data sovereignty is critical post-Schrems II. By terminating SSL at your Ingress controller on a Norwegian VPS, you ensure strictly encrypted transit before traffic ever touches your internal pods. Ensure your TLS certificates are managed via cert-manager with Let's Encrypt for automation.
The Infrastructure Factor: Why Etcd Fails
Kubernetes networking state is stored in etcd. Every time a pod dies, an IP changes, or a service updates, etcd writes to disk. If that disk write is slow, the API server hangs. If the API server hangs, network updates stop propagating.
I have debugged clusters where the network was blamed, but the culprit was slow disk I/O on a cheap VPS. The fsync latency was exceeding 10ms, causing etcd leader elections to fail. The network overlay wasn't broken; the brain of the cluster was having a stroke.
This is where hardware choice becomes a networking decision. We run our production workloads on CoolVDS for one reason: NVMe storage. Standard SSDs often choke under the random write intensity of etcd, especially in a busy cluster. CoolVDS NVMe instances provide the IOPS required to keep etcd latencies well below the danger zone.
Kernel Tuning for High Traffic
Finally, standard Linux distros are tuned for general purpose use, not for routing millions of packets. If you are building a cluster on CoolVDS, inject these sysctl settings into your node initialization process (via cloud-init or Ansible):
# /etc/sysctl.d/99-k8s-network.conf
# Increase the range of ephemeral ports
net.ipv4.ip_local_port_range = 1024 65535
# Reuse connection in TIME_WAIT state
net.ipv4.tcp_tw_reuse = 1
# Maximize the backlog of incoming connections
net.core.somaxconn = 32768
net.ipv4.tcp_max_syn_backlog = 8192
# Increase max open files
fs.file-max = 2097152Conclusion: Deterministic Performance
Kubernetes networking is complex, but it shouldn't be unpredictable. By switching to IPVS, utilizing eBPF where possible, and tuning your NGINX ingress, you eliminate software bottlenecks. However, software optimization cannot fix noisy neighbor issues or slow disks.
For serious projects targeting the Norwegian market, you need infrastructure that respects physics. Low latency to Oslo and high I/O throughput aren't luxuries; they are requirements for a stable Kubernetes control plane.
Don't let I/O wait times kill your cluster's network performance. Spin up a CoolVDS NVMe instance today and see what 0.1ms disk latency does for your etcd stability.