Console Login

Kubernetes Networking Deep Dive: CNI, IPVS, and MetalLB on Bare Metal VPS

Stop Treating Kubernetes Networking Like Magic

It is 2020. We are running critical workloads on Kubernetes, yet I still see Senior Engineers treat the network stack like a black box. They apply a CNI (Container Network Interface) manifest they found on GitHub, pray to the gods of DNS, and panic when latency spikes.

If you are deploying Kubernetes on bare metal or high-performance VPS environments here in Norway, the abstraction leaks. Fast. The default settings are designed for compatibility, not for the low-latency requirements of modern fintech or high-traffic e-commerce.

I have spent the last week debugging a cluster where packet drops were happening randomly. The culprit wasn't code; it was the conntrack table limits and VXLAN overhead. Let’s tear down the stack and rebuild it properly.

The CNI Battlefield: Flannel vs. Calico

When you initialize kubeadm, you have a choice. Many tutorials point you to Flannel because it is simple. Do not use Flannel for production.

Flannel typically uses VXLAN to encapsulate packets. This means every packet sent between pods on different nodes gets wrapped in a UDP packet, sent over the wire, and unwrapped. This encapsulation consumes CPU. On a shared, cheap VPS where CPU steal is high, this kills your network throughput.

For serious deployments, we use Calico. Specifically, Calico in BGP mode (if your network supports it) or IPIP (IP-in-IP) if it doesn't. BGP allows direct routing of pod traffic without the encapsulation overhead, assuming your nodes share a Layer 2 domain.

Here is how you actually verify what your CNI is doing. Don't just trust the YAML.

# Check the Calico node status to see BGP peering sudo calicoctl node status # Expected output for a healthy 3-node cluster: IPv4 BGP status +--------------+-------------------+-------+----------+-------------+ | PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO | +--------------+-------------------+-------+----------+-------------+ | 192.168.1.10 | node-to-node mesh | up | 12:00:00 | Established | | 192.168.1.11 | node-to-node mesh | up | 12:00:05 | Established | +--------------+-------------------+-------+----------+-------------+
Pro Tip: If you are hosting on CoolVDS, our internal networking allows for low-latency communication between instances. We recommend configuring Calico with CALICO_IPV4POOL_IPIP set to "CrossSubnet". This ensures encapsulation is only used when absolutely necessary, keeping local traffic fast.

The Bottleneck: Kube-Proxy and iptables

By default, Kubernetes 1.18 uses iptables for service discovery and load balancing within the cluster. This works fine for 50 services. It falls apart at 5,000.

iptables is a linear list. Every packet has to traverse the chain to find a match. As you add Services and Pods, that list grows. Latency increases O(n). To fix this, you must enable IPVS (IP Virtual Server) mode. IPVS is built on top of netfilter but uses hash tables for lookups, providing O(1) performance.

Enabling IPVS Mode

First, ensure the kernel modules are loaded on your underlying VPS nodes. This is often missed.

# Load necessary modules modprobe ip_vs modprobe ip_vs_rr modprobe ip_vs_wrr modprobe ip_vs_sh modprobe nf_conntrack

Next, edit your kube-proxy configuration. If you used kubeadm, it's in a ConfigMap.

kubectl edit configmap kube-proxy -n kube-system

Find the mode field and change it from "" (which defaults to iptables) to "ipvs".

mode: "ipvs" ipvs: strictARP: true

Kill the kube-proxy pods to force a restart. The difference in service resolution time is measurable, especially when your cluster is under load.

The Ingress Problem: Life Without AWS ELB

One of the biggest shocks for engineers moving from AWS/GCP to self-hosted infrastructure (like a CoolVDS cluster) is the LoadBalancer service type remains in "Pending" state forever. There is no magic cloud controller to hand you an IP.

In 2020, the standard solution is MetalLB. It allows your Kubernetes cluster to respond to ARP requests for external IPs, effectively announcing "I have this IP" to the network switch.

Here is a battle-tested Layer 2 configuration for MetalLB.

apiVersion: v1 kind: ConfigMap metadata: namespace: metallb-system name: config data: config: | address-pools: - name: default protocol: layer2 addresses: - 198.51.100.10-198.51.100.20

When you apply this, your NGINX Ingress Controller service will finally pick up an External-IP. Traffic flows directly to the node, and kube-proxy (now using IPVS, hopefully) routes it to the pod.

The Hardware Reality: Why Your VPS Matters

Software optimization only goes so far. Networking is I/O. If your hosting provider over-provisions the host machine, you suffer from "noisy neighbor" syndrome. When a neighbor saturates the physical NIC or spikes their CPU usage, your packet processing gets delayed.

In Kubernetes, network latency equals application latency. If etcd heartbeats are delayed by network jitter, your cluster might think a node is dead and start unnecessary rescheduling storms. This is catastrophic.

This is why we architect CoolVDS differently. We prioritize dedicated resource allocation. When you spin up an instance with us, you are getting the raw throughput capabilities of NVMe storage and dedicated CPU cycles. We don't over-subscribe. For a Kubernetes node, consistency is more valuable than burst speed.

Security: The GDPR Firewall

We are in Norway. Datatilsynet does not joke around. With the privacy shield situation becoming increasingly unstable, keeping traffic local is paramount.

By default, Kubernetes allows all pods to talk to all pods. This is a security nightmare and potentially a compliance violation if sensitive data crosses boundaries it shouldn't. You need NetworkPolicies.

Start with a default deny policy. Lock it down.

apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: default-deny-all namespace: default spec: podSelector: {} policyTypes: - Ingress - Egress

Then, explicitly allow DNS (UDP port 53) and traffic from your Ingress controller. If you don't allow DNS, your pods will fail to resolve service names, and you will spend hours debugging why curl google.com hangs.

Conclusion

Kubernetes networking in 2020 is robust, but it requires manual tuning to get right on bare metal. Moving from Flannel to Calico, enabling IPVS, and setting up MetalLB gives you a cloud-like experience with the control (and cost savings) of VPS hosting.

Don't let poor infrastructure ruin your architecture. You can have the best YAML in the world, but if the packets drop at the hypervisor level, you lose. Deploy your next cluster node on CoolVDS and see what consistent I/O does for your `etcd` latency.