Kubernetes Networking on Bare-Metal VPS: Escaping iptables Hell
Let’s be honest: Kubernetes networking is often treated as a black box. You apply a CNI manifest, wait for the pods to go green, and pray you never have to debug packet drops inside a VXLAN tunnel at 3 AM. I’ve been there. I once spent 48 hours debugging intermittent timeouts on a financial cluster in Oslo, only to realize the underlying VPS provider had inconsistent MTU settings on their switches that were fragmenting packets silently.
If you are running Kubernetes on managed clouds like GKE or EKS, a lot of this is abstracted away. But if you are a serious engineer building on raw infrastructure—like CoolVDS NVMe instances—to maintain data sovereignty in Norway or simply to cut the "cloud tax," you need to understand exactly how bytes move from Pod A to Pod B.
This is not a "Hello World" tutorial. This is a look at how to architect K8s networking for performance, specifically when running on high-performance Linux VPS environments in 2022.
1. The CNI Battlefield: Calico vs. Cilium
In late 2022, the choice of Container Network Interface (CNI) defines your cluster's performance profile. For years, Calico (using encapsulation via IPIP or VXLAN) was the standard. It uses iptables heavily. When you scale to thousands of services, iptables rules become a sequential list that the kernel has to traverse. This adds latency.
Enter Cilium and eBPF. Instead of using legacy iptables, Cilium programs the Linux kernel directly to route packets. On our benchmarks using CoolVDS instances, switching from iptables-based routing to eBPF reduced service-to-service latency by roughly 15-20% under high load.
If you are deploying on a fresh cluster today (Kubernetes 1.24+), I strongly recommend looking at Cilium. However, if you need rock-solid, battle-tested simplicity, Calico is still a beast. Here is a production-ready configuration snippet for Calico, specifically tuning the MTU to avoid fragmentation on standard VPS networks:
kind: ConfigMap
apiVersion: v1
metadata:
name: calico-config
namespace: kube-system
data:
# Typha is needed for clusters > 50 nodes to reduce load on the API server
typha_service_name: "none"
# Autodetect the IPv4 address
veth_mtu: "1450" # CRITICAL: Leave room for VXLAN headers inside the host MTU (usually 1500)
# Enable IPIP encapsulation for cross-subnet traffic
calico_backend: "bird"
Pro Tip: Never assume the MTU is 1500. On virtualized infrastructure, the underlying hypervisor (KVM) overhead might mean your safe payload is 1450 or 1460 bytes. If your CNI pushes 1500 bytes into a tunnel, fragmentation occurs, CPU usage spikes, and throughput collapses. We optimize CoolVDS host networks to handle standard frames efficiently, but your overlay config must match.
2. Exposing Traffic: MetalLB in Layer 2 Mode
On AWS, you create a service of type LoadBalancer and get an ELB. On a VPS, you get... Pending. This is where MetalLB saves the day. It allows you to simulate a LoadBalancer on bare-metal or VPS environments.
In a standard setup, I use Layer 2 mode. One node in the cluster claims ownership of the external IP via ARP. If that node dies, another takes over. It's simple and effective for clusters that don't need BGP peering with top-of-rack switches.
Here is how you configure the IPAddressPool (note the new CRD format introduced recently):
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: public-pool
namespace: metallb-system
spec:
addresses:
- 192.168.10.0/24 # Replace with your CoolVDS reserved public IP block
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: public-advertisement
namespace: metallb-system
spec:
ipAddressPools:
- public-pool
When you use this, traffic hits the node directly. This is why low latency matters. If your VPS provider oversubscribes CPUs, the ARP responses get delayed, and your failover time increases. CoolVDS guarantees strict resource isolation, meaning your ARP broadcasts are processed instantly.
3. The Conntrack Trap
Kubernetes relies heavily on Linux connection tracking (conntrack). Every NAT operation (Service IP to Pod IP) creates an entry. If you run a high-traffic ingress controller or a DNS server, you can fill this table. When the table is full, the kernel drops new packets. No errors in the application logs, just silence.
I recently audited a cluster for a media streaming client that was crashing randomly. It wasn't the app; it was net.netfilter.nf_conntrack_max being stuck at the default 131072. On a high-performance server, bump this up.
Apply this via a DaemonSet or `sysctl` on the host nodes:
# /etc/sysctl.d/k8s-tuning.conf
net.netfilter.nf_conntrack_max = 524288
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
# Reduce TCP keepalive for faster dead connection cleanup
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_intvl = 60
net.ipv4.tcp_keepalive_probes = 3
4. Why Underlying Hardware Dictates Network Stability
Overlay networks (VXLAN/IPIP) require CPU cycles to encapsulate and decapsulate packets. If you are running on a "budget" VPS where the CPU steal time is high (meaning neighbors are using your cycles), your network throughput will fluctuate wildly. This creates jitter.
For applications hosted in Norway, specifically those dealing with real-time data or heavy API traffic (like OpenBanking APIs), jitter is unacceptable. We built CoolVDS on KVM with dedicated NVMe storage specifically to eliminate this "noisy neighbor" effect. When the CPU is free to process network interrupts immediately, your K8s overlay network feels like physical wiring.
5. Debugging When It All Goes Wrong
When a pod cannot reach another pod, don't guess. Use tcpdump. But since pods are ephemeral, debugging containers is annoying. I use a simple "netshoot" pod for this.
kubectl run tmp-shell --rm -i --tty --image nicolaka/netshoot -- /bin/bash
Once inside, check DNS first (it's always DNS):
nslookup kubernetes.default
If that works, check the service endpoints:
# Check if the endpoints exist
kubectl get endpoints my-service
# Curl the specific pod IP (bypass Service VIP) to rule out iptables issues
curl -v http://10.244.2.5:8080
Conclusion: Control Your Packets
Kubernetes networking is robust, but it requires a solid foundation. In 2022, relying on managed services is easy, but it comes with a cost—both financial and in terms of control. By building on CoolVDS, you keep your data in Norway (compliant with local regulations) and gain the raw performance needed for eBPF and high-throughput overlays.
Don't let a default configuration limit your throughput. Tune your sysctls, choose the right CNI, and ensure your underlying infrastructure can handle the load.
Ready to build a cluster that doesn't choke on traffic? Deploy a high-performance NVMe instance on CoolVDS today and experience the difference low latency makes.