Console Login

Kubernetes Networking Deep Dive: From CNI Wars to IPVS Optimization

Taming the Hydra: A Deep Dive into Kubernetes Networking Performance

Cluster networking is usually where the "Cloud Native" abstraction starts to leak. You deploy a perfectly containerized microservice, apply your YAMLs, and suddenly you're chasing 500ms latency spikes or intermittent DNS failures that disappear when you run tcpdump. I have seen production clusters in Oslo bring powerful servers to their knees not because of CPU load, but because the conntrack table was full. It is messy. It is complex. And if you ignore the low-level details, it will break your SLA.

In 2022, we are arguably in the "Golden Era" of the Container Network Interface (CNI). We aren't just stitching Docker bridges together anymore. We have eBPF emerging as a standard, Service Meshes like Istio becoming mainstream, and the removal of Dockershim forcing us all to get comfortable with containerd. But all the software defined networking (SDN) wizardry in the world cannot fix a noisy neighbor on your host node.

The CNI Battlefield: Calico vs. Cilium

Your choice of CNI determines your cluster's heartbeat. For years, Flannel was the default for "I just want it to work," and Calico was for "I need BGP and policies." But the landscape has shifted. If you are running high-throughput workloads—think real-time financial data processing or high-traffic e-commerce serving Norway—you need to look beyond simple encapsulation.

Calico (The Reliable Standard)

Calico uses BGP (Border Gateway Protocol) to route packets without the overhead of encapsulation (IP-in-IP or VXLAN), assuming your underlying network supports it. On CoolVDS KVM instances, where you have significant control over the Layer 2/3 environment, running Calico in pure Layer 3 mode is incredibly efficient.

Pro Tip: If you are seeing `Cross-subnet` traffic latency, check your MTU settings. The default VXLAN overhead often fragments packets if the underlying VDS interface MTU is standard 1500. Always clamp MSS or configure MTU correctly in your CNI config map.

Cilium (The eBPF Challenger)

Cilium has changed the conversation by bypassing iptables entirely using eBPF (Extended Berkeley Packet Filter). This allows the kernel to process packets at lightning speed without traversing the entire network stack. In 2022 benchmarks, we are seeing Cilium outperform kube-proxy significantly when service counts exceed 5,000.

Here is how you verify what mode your Cilium agent is running in, ensuring eBPF is actually active:

kubectl -n kube-system exec -ti cilium-xxxxx -- cilium status

# Look for this output:
# KubeProxyReplacement:   Strict   [eth0 (Direct Routing)]
# IPv6 BIG TCP:           Enabled
# BandwidthManager:       Enabled

The `kube-proxy` Bottleneck: Iptables vs. IPVS

Most default Kubernetes installations still default `kube-proxy` to use iptables. This is fine for a dev cluster. It is disastrous for a production cluster with thousands of services. Iptables is a list processing system; O(n) complexity. Every packet has to run the gauntlet of rules. If you have 5,000 services, that packet processing time adds up to measurable latency.

IPVS (IP Virtual Server) is the solution. It uses hash tables (O(1) complexity). It doesn't care if you have ten services or ten thousand. The lookup time is constant. Switching to IPVS is often the single biggest network performance upgrade you can make without changing hardware.

To enable IPVS, you need to ensure the kernel modules are loaded on your worker nodes before the kubelet starts. On a CoolVDS instance running Ubuntu 20.04 or 22.04, you would automate this setup:

# Load required modules
modprobe -- ip_vs
modprobe -- ip_vs_rr
modprobe -- ip_vs_wrr
modprobe -- ip_vs_sh
modprobe -- nf_conntrack

# Verify they are loaded
lsmod | grep -e ip_vs -e nf_conntrack

Then, update your kube-proxy config map:

apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs"
ipvs:
  strictARP: true
  scheduler: "rr" # Round Robin

Ingress and The "Noisy Neighbor" Reality

Let's talk about the hardware underneath your Kubernetes cluster. You can tune your sysctls until you are blue in the face, but if the hypervisor hosting your node is stealing CPU cycles to service another tenant, your network throughput will jitter. This is the "Steal Time" (%st) metric in top.

Packet processing is CPU intensive. Interrupts (IRQs) need to be handled immediately. If the CPU is busy waiting for the hypervisor, the NIC ring buffer fills up, and packets get dropped. This is invisible to your application logs—it just looks like the network "paused."

This is where the architecture of your VPS provider becomes critical. At CoolVDS, we don't oversubscribe cores on our NVMe instances. When we say you have 4 vCPUs, those cycles are yours. This consistency is mandatory for handling ingress traffic, especially if you are terminating SSL/TLS at the ingress controller level (using NGINX or Traefik).

Here is a snippet for NGINX Ingress tuning that we use to handle high-concurrency traffic typical of Nordic e-commerce giants during Black Friday sales:

apiVersion: v1
kind: ConfigMap
metadata:
  name: ingress-nginx-controller
  namespace: ingress-nginx
data:
  worker-processes: "4" # Match your VDS core count
  max-worker-connections: "65536"
  keep-alive: "65"
  upstream-keepalive-connections: "100"
  compute-full-forwarded-for: "true"
  use-forwarded-headers: "true"

Local Context: Latency and GDPR in Norway

Technical architecture does not exist in a vacuum. If you are operating in Norway, you are dealing with two external pressures: Schrems II and NIX Latency.

Legally, relying on US-owned cloud hyperscalers for hosting personal data (PID) is becoming a compliance minefield. The Datatilsynet (Norwegian Data Protection Authority) has been increasingly strict regarding data transfers. Hosting your Kubernetes worker nodes on local infrastructure like CoolVDS ensures data residency. Your persistent volumes (PVs) stay on disks physically located in Oslo or nearby European data centers, simplifying your GDPR compliance posture.

Technically, physics wins. If your user base is in Scandinavia, routing traffic to Frankfurt or London adds 20-30ms of round-trip time. By hosting locally, you drop that to <5ms. For a Kubernetes cluster doing database replication or handling synchronous API calls, that reduction in latency significantly improves the "snappiness" of the application.

Conclusion: Stop Guessing, Start Measuring

Kubernetes networking is robust, but it is not magic. It requires deliberate choices regarding CNI, proxy modes, and underlying infrastructure. Don't settle for default configurations that were designed for minikube.

If you are tired of unexplained latency and want a platform where %st is virtually zero, you need infrastructure that respects your need for raw performance.

Ready to test your network throughput? Deploy a high-performance NVMe KVM instance on CoolVDS today and run your own iperf3 benchmarks against the giants. The results will speak for themselves.