Console Login

Kubernetes Networking Deep Dive: Taming the Packet Storm on Bare Metal

It is 3:00 AM. PagerDuty is screaming. Your pods are technically "Running," but the latency between your frontend and your Redis cache has just spiked from 2ms to 200ms. You check the logs: nothing. You check the CPU: idle. Welcome to the invisible hell of Kubernetes networking.

In the rush to adopt Kubernetes 1.10 this year, too many teams in Oslo and across Europe are treating the network layer as a black box. They slap Flannel on a cluster, assume the default overlay settings work, and then wonder why their throughput falls off a cliff under load. With GDPR fully enforceable as of last month (May 2018), we can no longer afford sloppy data transit or erratic routing.

I have spent the last week debugging a high-traffic Magento deployment where packet fragmentation was silently killing conversion rates. Here is the raw truth about CNI choices, MTU headaches, and why your underlying VPS architecture matters more than your YAML files.

The CNI Battlefield: Flannel vs. Calico

The Container Network Interface (CNI) is where the rubber meets the road. If you are running on CoolVDS, you have the luxury of raw performance, but you can bottle-neck it instantly with the wrong plugin.

Flannel is the default for a reason: it is simple. It creates a VXLAN overlay. But VXLAN encapsulates packets, adding CPU overhead to every single byte transmitted. For a simple dev environment, it's fine. For high-performance production?

Pro Tip: If you need raw throughput, stop using VXLAN. Look for Layer 3 routing solutions like Calico using BGP (Border Gateway Protocol). This routes packets natively without the encapsulation overhead.

Here is a quick comparison based on our internal benchmarks from earlier this month:

Feature Flannel (VXLAN) Calico (BGP)
Complexity Low Medium/High
Overhead Encapsulation (High CPU) Near Native (Low CPU)
Network Policies No Yes (Crucial for Security)

Deploying Calico for Policy Enforcement

Security is not optional anymore. With the new privacy regulations, you need to isolate namespaces. Flannel doesn't support NetworkPolicy resources out of the box. Calico does.

Here is how we initialize Calico on a fresh 1.10 cluster to ensure we can lock down traffic:

# Download the Calico manifest (Version 3.1)
wget https://docs.projectcalico.org/v3.1/getting-started/kubernetes/installation/hosted/kubeadm/1.7/calico.yaml

# Apply it to the cluster
kubectl apply -f calico.yaml

# Verify the daemon set is running
kubectl get ds -n kube-system

The MTU Trap: Where Packets Go to Die

This is the most common configuration error I see. The standard Maximum Transmission Unit (MTU) for an Ethernet interface is 1500 bytes. If you use an overlay network (like VXLAN or IP-in-IP), the overlay adds a header (usually 50 bytes).

If your pod tries to send a 1500-byte packet, it gets encapsulated. The total size becomes 1550 bytes. The physical interface on the host says "No, thanks" and fragments the packet. Your performance drops by 30-50% instantly.

You must configure your CNI MTU to be lower than the host interface MTU. On CoolVDS instances, where we provide pristine network interfaces, we recommend setting the internal MTU to 1450 if you must use overlays.

You can check the MTU inside a pod with this command:

kubectl exec -it <pod-name> -- ip link show eth0

If it says 1500 and you are using VXLAN, you are in trouble.

Scaling Services: IPVS vs. Iptables

Before Kubernetes 1.8, kube-proxy relied entirely on iptables. Iptables is a firewall, not a load balancer. It uses sequential rule lists. When you have 5,000 services, every packet has to traverse a massive list of rules (O(n) complexity).

As of Kubernetes 1.10 (and moving into 1.11), IPVS (IP Virtual Server) is the superior mode for kube-proxy. It uses hash tables (O(1) complexity). It is essential for larger clusters.

To enable IPVS mode, you need to ensure the kernel modules are loaded on your worker nodes before K8s starts:

# Load necessary kernel modules
modprobe ip_vs
modprobe ip_vs_rr
modprobe ip_vs_wrr
modprobe ip_vs_sh
modprobe nf_conntrack_ipv4

# Check if they are loaded
lsmod | grep -e ip_vs -e nf_conntrack_ipv4

Then, edit your kube-proxy config map to switch the mode:

apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs"

The Hardware Reality: Why IOPS Matter for Networking

You might wonder why a networking post talks about storage. Kubernetes stores its state in etcd. Every network change, every pod spin-up, every service update hits etcd.

If your disk latency is high, etcd slows down. If etcd slows down, the API server hangs. If the API server hangs, network updates to kube-proxy and CNI plugins are delayed. This is why running Kubernetes on cheap, shared storage is a suicide mission.

This is where CoolVDS fits the architectural requirement. By using NVMe storage by default, we eliminate the etcd I/O bottleneck. In Norway, latency to the NIX (Norwegian Internet Exchange) is critical. We see latencies as low as 1-2ms within the Oslo region.

System Tuning for High Throughput

Finally, don't forget the kernel. Linux defaults are often set for general-purpose computing, not high-traffic container routing. We apply the following sysctl settings on our CoolVDS nodes hosting heavy workloads:

# /etc/sysctl.conf

# Allow IP forwarding (Required for K8s)
net.ipv4.ip_forward = 1

# Increase the range of ephemeral ports
net.ipv4.ip_local_port_range = 1024 65535

# Increase TCP buffer sizes for high-speed networks
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216

# Enable TCP Fast Open (if supported by your app)
net.ipv4.tcp_fastopen = 3

Conclusion

Kubernetes networking is not set-and-forget. It requires deliberate choices regarding CNI plugins, MTU alignment, and kernel tuning. The difference between a sluggish cluster and a performant one often comes down to these configurations and the quality of the infrastructure underneath.

Do not let packet fragmentation destroy your user experience. If you are ready to test a properly tuned environment, spin up a CoolVDS NVMe instance today. The latency drop will speak for itself.