Console Login

Kubernetes Networking in Production: Surviving the CNI Jungle on Bare Metal VDS

Kubernetes Networking in Production: Surviving the CNI Jungle on Bare Metal VDS

If you have ever stared at a CrashLoopBackOff caused by a timeout that only happens every third Tuesday, you know the truth: Kubernetes networking is where happiness goes to die. It is an abstraction layer built on top of an abstraction layer, usually running on virtualized hardware that you don't control. It's messy.

I recently audited a setup for a fintech client in Oslo. They were running a mid-sized cluster on a generic European cloud provider. Their issue? Random 502 errors during peak trading hours. The application logs were clean. The load balancers were green. But the packets were dropping.

The culprit wasn't their code. It was the "Steal Time" on their virtual CPUs. The underlying host was overselling resources, and the noisy neighbor next door was starving the Linux kernel's ability to process software interrupts (SoftIRQs) for the virtual network interface. In a containerized environment, packet processing is CPU-intensive. If your hypervisor hesitates, your CNI (Container Network Interface) chokes.

This is a deep dive into the mechanics of k8s networking, specifically for those building on self-managed infrastructure like CoolVDS, where you actually get the KVM isolation required to run this stack reliably.

1. The CNI Choice: VXLAN vs. BGP

When you spin up a cluster with kubeadm on a standard VPS, you have to pick a CNI. Most tutorials blindly point you to Flannel. If you are running production workloads, delete Flannel.

Flannel typically uses VXLAN (Virtual Extensible LAN) to create an overlay network. It encapsulates Layer 2 Ethernet frames within Layer 4 UDP packets. This adds overhead. Every packet has to be encapsulated and decapsulated. On a VDS with high latency, this destroys throughput.

For serious setups, I use Calico with BGP (Border Gateway Protocol). Instead of wrapping packets, Calico instructs the Linux kernel to route packets natively. It shares routes between nodes just like core internet routers do.

Here is how you check your current BGP peer status if you are running Calico on a 3-node CoolVDS cluster:

sudo calicoctl node status

Calico process is running.

IPv4 BGP status
+--------------+-------------------+-------+----------+-------------+
| PEER ADDRESS |     PEER TYPE     | STATE |  SINCE   |    INFO     |
+--------------+-------------------+-------+----------+-------------+
| 10.0.0.2     | node-to-node mesh | up    | 10:23:14 | Established |
| 10.0.0.3     | node-to-node mesh | up    | 10:23:15 | Established |
+--------------+-------------------+-------+----------+-------------+

IPv6 BGP status
No IPv6 peers found.

If you see Established, your nodes are talking directly. If you are using CoolVDS, this traffic stays on the high-speed internal network, meaning your pod-to-pod latency is practically non-existent.

2. Kube-Proxy: Ditch Iptables for IPVS

By default, Kubernetes uses iptables to handle Service routing. Every Service you create adds rules to the iptables chain. When you hit 5,000 services, that list becomes a sequential search that O(n) complexity destroys performance.

The solution is IPVS (IP Virtual Server). It is a kernel-level load balancer that uses hash tables for O(1) lookups. It is faster, more stable, and supports better load balancing algorithms (like least connection).

To enable this in a kubeadm cluster (assuming you are on Kubernetes v1.19 or v1.20), you need to edit your KubeProxyConfiguration.

apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs"
ipvs:
  strictARP: true
  excludeCIDRs: null
  minSyncPeriod: 0s
  scheduler: "rr"
  syncPeriod: 30s

Before applying this, ensure the kernel modules are loaded on your underlying OS (Ubuntu 20.04 example):

# Load required modules
modprobe -- ip_vs
modprobe -- ip_vs_rr
modprobe -- ip_vs_wrr
modprobe -- ip_vs_sh
modprobe -- nf_conntrack_ipv4

# Check they are loaded
lsmod | grep -e ip_vs -e nf_conntrack_ipv4
Pro Tip: Many "budget" VPS providers use OpenVZ or LXC virtualization. These share the host kernel. You generally cannot load specific kernel modules like IPVS or wireguard on those platforms. This is why CoolVDS uses KVM. You get your own kernel. You control the networking stack. Do not try to run Kubernetes on OpenVZ unless you enjoy pain.

3. The ETCD Bottleneck: Why Storage Speed is Networking Speed

This sounds contradictory, but your network performance is tied to your storage I/O. Kubernetes networking state is stored in etcd. When a pod dies and needs to be rescheduled, or when an Ingress route changes, the cluster slams etcd with writes.

If etcd latency spikes because your VPS is running on a spinning HDD or a throttled SATA SSD, the API server hangs. The CNI plugin fails to get an IP address assignment. The network converges too slowly.

We test storage rigorously using fio to ensure it meets the etcd recommended requirement of <10ms fsync latency (99th percentile). Here is the benchmark command you should run on your current host:

fio --rw=write --ioengine=sync --fdatasync=1 --directory=test-data --size=22m --bs=2300 --name=mytest

If the results show fsync latency over 10ms, your cluster will become unstable under load. CoolVDS NVMe instances consistently sit around 1-2ms, which is why we recommend them for the control plane nodes.

4. Ingress and the "MetalLB" Factor

On bare metal or VDS (like CoolVDS), you don't have an AWS Elastic Load Balancer magically appearing when you request a LoadBalancer service. You need MetalLB.

MetalLB allows your cluster to respond to ARP requests for external IPs. Essentially, your VDS announces "I have this IP address" to the local network.

Here is a basic Layer 2 configuration for MetalLB v0.9.6 (current stable):

apiVersion: v1
kind: ConfigMap
metadata:
  namespace: metallb-system
  name: config
data:
  config: |
    address-pools:
    - name: default
      protocol: layer2
      addresses:
      - 192.168.1.240-192.168.1.250

However, running this exposes you to the internet. You must secure the traffic flow. Since you are hosting in Norway (or targeting Europe), you are bound by GDPR and the recent Schrems II ruling. This ruling invalidated the Privacy Shield framework, making data transfers to US-owned clouds legally risky.

By hosting on CoolVDS, your data resides physically in Oslo or European datacenters, on hardware owned by a European entity. This simplifies your compliance posture significantly. You terminate TLS at the Ingress controller (NGINX) inside your cluster, and the unencrypted traffic never leaves the private VDS network.

5. Debugging When It Breaks

When networking fails, kubectl logs is rarely enough. You need to get inside the network namespace. Since modern container images are stripped of tools (distroless), you should learn to use ephemeral debug containers (beta in v1.18, stable-ish now).

# Launch a debugger sidecar with networking tools attached to a target pod
kubectl debug -it target-pod-name --image=nicolaka/netshoot --target=target-container-name

Once inside, use tcpdump to trace the handshake.

tcpdump -i eth0 -n port 80

If you see packets leaving but no SYN-ACK returning, check your Security Groups or Firewall rules. If you see fragmented packets, check your MTU settings. Overlay networks (VXLAN) add 50 bytes of overhead. If your host MTU is 1500 and your CNI MTU is also 1500, packets will drop. Set your CNI MTU to 1450 to be safe.

Summary

Kubernetes networking is deterministic, but it demands resources. It demands a kernel that isn't fighting for air, storage that writes instantly, and a topology that minimizes hops.

Don't build your production environment on a shared-core VPS that steals your CPU cycles just as your traffic spikes. CoolVDS offers the dedicated resources, KVM isolation, and NVMe I/O throughput required to run a stable, high-performance Kubernetes cluster in compliance with Norwegian standards.

Ready to stop debugging network timeouts? Deploy a high-performance KVM instance in Oslo today.