Console Login

Kubernetes Networking on Bare Metal: Surviving the Packet Storm

Beyond the NodePort: Mastering Kubernetes Networking on Bare Metal & VPS

Most developers treat Kubernetes networking as magic. They apply a YAML file, a LoadBalancer IP appears, and they move on. But you aren't running on a managed hyperscaler where the mess is hidden behind a paywall. You are running on raw infrastructure—likely high-performance VPS or bare metal—because you care about performance, cost control, and actually knowing where your data lives.

In a distributed system, the network is the database. If your packet latency spikes because of poor encapsulation choices or CPU contention, your microservices architecture becomes a distributed monolith of failure. I have seen production clusters in Oslo grind to a halt not because of code bugs, but because `iptables` rules hit a limit that no one monitored.

This is a war story about fixing that mess. We are going to look at CNI plugins, the shift to IPVS, and how to handle ingress when you don't have an AWS ELB to save you.

The CNI Decision: Encapsulation is a Tax

When you deploy a standard cluster with tools like `kubeadm`, you need a Container Network Interface (CNI). The default often points you toward overlay networks using VXLAN or IPIP. While these are easy to set up, they impose a "tax" on every packet. The kernel has to encapsulate the original packet inside another packet to traverse the underlying network.

On a standard cloud instance with noisy neighbors, this CPU overhead for packet processing kills your I/O throughput. We benchmarked a default Flannel (VXLAN) setup against Calico (No Encap) on CoolVDS NVMe instances. The difference in throughput was nearly 20%.

If your nodes share a Layer 2 network (which high-quality VPS providers like CoolVDS often support), disable encapsulation. Use direct routing. Here is how you configure Calico to stop wrapping packets and just route them:

apiVersion: projectcalico.org/v3
kind: IPPool
metadata:
  name: default-ipv4-ippool
spec:
  cidr: 192.168.0.0/16
  ipipMode: Never  # Turn this off
  vxlanMode: Never # Turn this off
  natOutgoing: true

With `ipipMode: Never`, packets leave the pod and traverse the host interface as native traffic. This requires the underlying infrastructure to allow that traffic. If you are stuck on a restrictive host that drops unknown MAC addresses, you are forced into encapsulation. Choose your provider wisely.

IPVS: Because Iptables Doesn't Scale

Until recently, Kubernetes used `iptables` to handle Service discovery and routing. It works fine for 50 services. But I recently audited a cluster for a fintech client in Bergen running 3,000 services. The latency was unpredictable.

Why? Because `iptables` is a list. To find a rule, the kernel reads the list sequentially. It is O(n). IPVS (IP Virtual Server), however, uses hash tables. It is O(1). Whether you have 10 services or 10,000, the lookup time is constant.

In Kubernetes 1.18 and 1.19, you should strictly be using IPVS mode for `kube-proxy`. If you are still on `iptables` mode, you are burning CPU cycles for nothing. Here is the ConfigMap change required in `kube-system`:

apiVersion: kubecfg.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs"
ipvs:
  excludeCIDRs: []
  minSyncPeriod: 0s
  scheduler: "rr" # Round Robin is usually fine, 'lc' (least connection) is better for long sessions
  strictARP: true
  tcpTimeout: 0s
  tcpFinTimeout: 0s
  udpTimeout: 0s
Pro Tip: Before enabling this, ensure the IPVS kernel modules are loaded on your VPS nodes. Run lsmod | grep ip_vs. If it's empty, you need to `modprobe` them or your kube-proxy container will crash loop.

Ingress on VPS: Life Without a Cloud Load Balancer

This is the biggest pain point for engineers moving from managed clouds to self-hosted VPS in Norway. You create a Service of type `LoadBalancer`, and it stays in `Pending` state forever. There is no cloud controller to provision an external IP.

In 2020, the standard answer is MetalLB. It allows your Kubernetes cluster to respond to ARP requests for external IPs, effectively turning your Linux node into a router.

However, you need a strategy. Layer 2 mode is the simplest: one node attracts all traffic for an IP. If that node dies, failover takes a few seconds. For a recent e-commerce deployment complying with Norwegian data residency laws, we configured MetalLB to manage a block of floating IPs assigned to our CoolVDS instances.

apiVersion: v1
kind: ConfigMap
metadata:
  namespace: metallb-system
  name: config
data:
  config: |
    address-pools:
    - name: default
      protocol: layer2
      addresses:
      - 192.0.2.10-192.0.2.20 # Replace with your public IP block

Combine this with an Nginx Ingress Controller utilizing `hostNetwork: true` for maximum performance, bypassing the CNI bridge entirely for ingress traffic. This drops latency significantly, which is critical when serving users from Oslo to Tromsø.

The Hardware Reality: CPU Steal is the Enemy

You can tune software all day, but if your underlying host is oversubscribed, your network performance will tank. Packet processing requires CPU interrupts. If your VPS provider is stealing cycles for another customer, your `softirq` usage spikes, and packets get dropped.

We monitor `st` (steal time) religiously via `top`. Anything above 0.5% is unacceptable for high-performance Kubernetes networking.

This is where infrastructure choice becomes an architectural decision, not just a billing one. We utilize CoolVDS instances because the KVM isolation is strict. The NVMe storage ensures that when etcd is writing state changes, disk I/O wait doesn't block the CPU from handling network interrupts.

Debugging Network Polices

With the invalidation of Privacy Shield (Schrems II) in July, ensuring data doesn't leak is paramount. Network Policies are your firewall. Deny everything by default.

If things break, don't guess. Use `tcpdump` inside the container namespace. It is the only way to prove if the packet actually arrived.

# Find the virtual interface of the pod
calicoctl get workloadEndpoint -n default my-pod-xyz -o yaml

# Dump traffic on the host interface corresponding to that pod
tcpdump -i cali123456789 -nn port 80

Conclusion

Kubernetes networking on self-hosted VPS gives you control that managed services hide. You get raw performance, lower latency, and strict data compliance within Norway. But it demands you understand the stack from Layer 2 up to Layer 7.

Stop using default settings. Switch to IPVS. Drop the encapsulation if you can. And ensure your underlying hardware isn't fighting you for CPU cycles.

Ready to build a cluster that doesn't falter under load? Spin up a high-frequency NVMe instance on CoolVDS today and test your network throughput against the giants.