Console Login

Kubernetes Networking Deep Dive: Surviving the Packet Drop Nightmare in Production

Surviving the Kubernetes Networking Maze: From CNI Wars to IPVS

I once spent 72 hours awake because a microservice in a cluster hosted in Oslo couldn't talk to a payment gateway in Frankfurt. The logs said "timeout." The developers said "it works locally." The monitoring dashboard showed all green. It turned out to be a subtle MTU fragmentation issue caused by a double-encapsulation scenario between the overlay network and the provider's cheap virtual switch.

Kubernetes networking is magic until it breaks. Then, it is a tangled mess of iptables rules, virtual ethernet pairs, and routing tables that can make grown sysadmins cry. If you are running default settings in production, you are sitting on a time bomb. We are going to rip open the abstraction layer and look at what actually happens to a packet when it hits your cluster.

The CNI Performance Trap: VXLAN vs. Direct Routing

Most tutorials tell you to just "apply the yaml" for Flannel or Calico and move on. In a production environment, specifically when targeting low latency users in Norway or Northern Europe, this is negligence. By default, many CNIs (Container Network Interfaces) use VXLAN (Virtual Extensible LAN) to create an overlay network.

VXLAN encapsulates your packet inside a UDP packet. This adds overhead. It consumes CPU cycles for encapsulation/decapsulation and reduces your effective MTU (Maximum Transmission Unit). If your host interface is standard 1500 bytes, and VXLAN takes 50 bytes, your Pod needs an MTU of 1450.

If your application tries to push a full 1500-byte payload, it gets fragmented or dropped if the DF (Don't Fragment) bit is set. This is the silent killer of throughput.

Diagnosing MTU Issues

Don't guess. Check the link inside the pod.

kubectl exec -it my-pod -- ip link show eth0

If you see an MTU of 1440 or 1450, you are running an overlay. For high-performance workloads, like real-time data processing or high-frequency trading platforms, you want Direct Routing (often via BGP). This removes the encapsulation overhead entirely.

Here is how you might configure Calico to use BGP peering instead of VXLAN, assuming your underlying infrastructure (like CoolVDS L2 segments) supports it:

apiVersion: projectcalico.org/v3
kind: BGPConfiguration
metadata:
  name: default
spec:
  logSeverityScreen: Info
  nodeToNodeMeshEnabled: true
  asNumber: 63400

This allows pods to share routes directly. However, this requires a provider that doesn't block unknown MAC addresses or BGP traffic on the private network. This is where "commodity clouds" fail and bare-metal style virtualization shines.

Iptables vs. IPVS: The Scalability Wall

By default, Kubernetes uses kube-proxy in iptables mode. This works fine for 50 services. It works okay for 500. But if you are running a microservices architecture with 5,000 services, iptables becomes a linear bottleneck.

Iptables lists are evaluated sequentially (O(n)). Every packet has to traverse the chain until it finds a match. I have seen service latency jump from 2ms to 200ms just because the cluster grew too large.

The solution in 2023 is IPVS (IP Virtual Server). IPVS is built on top of the netfilter framework but uses hash tables (O(1)). It doesn't care if you have 10 services or 10,000; the lookup time remains constant.

Pro Tip: Before switching, ensure the ip_vs kernel modules are loaded on your worker nodes.
lsmod | grep ip_vs

If that returns nothing, you need to load them. On a CoolVDS NVMe instance running Ubuntu 22.04 or Debian 11, you generally have the necessary kernel support out of the box, but always verify.

# Load necessary modules
modprobe ip_vs
modprobe ip_vs_rr
modprobe ip_vs_wrr
modprobe ip_vs_sh
modprobe nf_conntrack

Then, edit your kube-proxy configuration map to switch modes:

apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs"
ipvs:
  strictARP: true
  tcpTimeout: 0s
  tcpFinTimeout: 0s
  udpTimeout: 0s

Changing this one setting can reduce CPU load on your nodes by 30% during high traffic spikes.

The Hardware Reality: Why Virtualization Matters

You can tune sysctls until you are blue in the face, but you cannot software-optimize a noisy neighbor. In the Nordic hosting market, many "VPS" providers are actually selling oversold containers (LXC/OpenVZ) where you share the kernel with 50 other customers.

Networking is interrupt-driven. If another customer on the host is getting DDoS'd, the CPU interrupts handling those packets starve your Kubelet. Your node goes NotReady. Pods evict. Downtime happens.

This is why for Kubernetes, I strictly recommend KVM-based virtualization with dedicated resource allocation. CoolVDS uses KVM, meaning your kernel is yours. Your network stack is isolated. When we run benchmarks using iperf3 between two CoolVDS instances in the same datacenter, we see consistent throughput with minimal jitter.

iperf3 -c 10.0.0.5 -t 30 -P 8

If you run that command on a budget VPS during peak hours, you will see the bitrate fluctuate wildly. That fluctuation kills TCP window scaling and ruins application performance.

Optimizing Ingress for Nordic Latency

If your traffic is primarily coming from Norway (Oslo, Bergen, Trondheim) or the broader EU, your Ingress controller needs to be tuned for those network conditions. The default Nginx Ingress configuration is too conservative.

We need to adjust keepalives and buffer sizes to handle modern TLS handshakes efficiently without checking back with the kernel constantly.

data:
  keep-alive: "120"
  keep-alive-requests: "10000"
  upstream-keepalive-connections: "200"
  upstream-keepalive-timeout: "60"
  worker-processes: "auto"
  worker-rlimit-nofile: "65535"
  reuse-port: "true"

Specifically, reuse-port is critical. It allows multiple worker threads to bind to the same port, letting the kernel distribute incoming connections. On a CoolVDS instance with high core counts, this maximizes multicore utilization.

Local Context: Data Sovereignty & The NIX

Let's talk about the elephant in the server room: GDPR and Schrems II. In 2023, relying on US-owned hyper-scalers is a legal headache for Norwegian companies processing sensitive user data. Datatilsynet (The Norwegian Data Protection Authority) is not lenient.

Hosting your Kubernetes cluster on CoolVDS keeps data within the EEA/Norway legal framework. Furthermore, latency matters. CoolVDS peers directly at NIX (Norwegian Internet Exchange). The round-trip time (RTT) from a user in Oslo to a server in Oslo is often sub-2ms. Compare that to 30ms+ routing to a data center in Ireland or Frankfurt. For database queries or API calls, that physics advantage is unbeatable.

Final Thoughts

Kubernetes is not a "set it and forget it" platform. It requires a deep understanding of the Linux networking stack. You need to verify your MTUs, choose the right proxy mode (IPVS), and most importantly, run it on infrastructure that doesn't steal your CPU cycles.

Don't let slow I/O or noisy neighbors kill your SEO or user experience. You need a solid foundation.

Deploy a high-performance KVM instance on CoolVDS today and see the difference dedicated NVMe storage and unthrottled networking makes for your cluster.