Console Login

Kubernetes Networking Deep Dive: Optimizing the Data Plane in Production

The Myth of the "Flat Network"

If you read the Kubernetes documentation, they tell you about the "flat network" model. Every Pod gets an IP. Every Pod can talk to every other Pod. It sounds beautiful, almost utopian. But if you have ever run a cluster with high throughput requirements in production, you know that this abstraction leaks faster than a sieve.

I recently spent three sleepless nights debugging a microservices architecture for a fintech client based here in Oslo. They were seeing random 502 errors and latency spikes during peak trading hours. The application logs showed nothing. The database was bored. The culprit? Packet drops at the hypervisor level caused by VXLAN encapsulation overhead.

Kubernetes networking is not magic; it is a complex stack of Linux namespaces, bridges, and routing tables. In this post, we are going to rip open the hood of the K8s networking model (as of v1.15), look at CNI plugins, and explain why your choice of infrastructure provider—and specifically their virtualization technology—matters more than your YAML files.

1. The CNI Wars: Flannel vs. Calico

In 2019, if you are spinning up a cluster with kubeadm, your first decision is the Container Network Interface (CNI). This isn't just a checkbox.

Flannel is the default for many. It’s simple. It creates a VXLAN overlay. Essentially, it wraps your Layer 3 packets inside UDP packets to send them across the host network. This works, but it adds a CPU tax to every single byte sent. On a shared, oversold VPS, this tax is deadly.

Calico offers a different approach: pure Layer 3 routing using BGP (Border Gateway Protocol). No encapsulation headers if your network supports it. This is what we run when performance matters.

Here is a typical Calico manifest application for a 192.168.0.0/16 pod network:

kubectl apply -f https://docs.projectcalico.org/v3.8/manifests/calico.yaml

However, simply applying the manifest isn't enough. If you are running on a cloud provider that doesn't allow BGP peering or blocks IPIP traffic, Calico falls back to encapsulation, negating the benefit. This is why we built CoolVDS with KVM and unrestricted internal networking capabilities. We don't block the protocols you need to squeeze out the last millisecond.

2. The Bottleneck: iptables vs. IPVS

Until recently, Kubernetes used iptables to handle Service discovery and load balancing (via kube-proxy). This works fine for small clusters. But iptables is a list. A sequential list.

If you have 5,000 services, the kernel has to traverse a massive list of rules for every packet to find where it goes. It is O(n) complexity. I've seen CPU usage on the node hit 100% just processing rules, not serving traffic.

As of Kubernetes 1.11, IPVS (IP Virtual Server) went GA. IPVS is built on top of the netfilter framework but uses hash tables. It is O(1) complexity. It doesn't care if you have 10 services or 10,000. The lookup time is constant.

If you are serious about performance in 2019, you need to enable IPVS mode in your cluster. But first, your underlying kernel needs the modules loaded. Here is the prep work required on your nodes:

# Load necessary kernel modules for IPVS
modprobe -- ip_vs
modprobe -- ip_vs_rr
modprobe -- ip_vs_wrr
modprobe -- ip_vs_sh
modprobe -- nf_conntrack_ipv4

# Check if they are loaded
lsmod | grep -e ip_vs -e nf_conntrack_ipv4

Once the modules are active, you edit your kube-proxy ConfigMap to switch modes:

apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs"
ipvs:
  scheduler: "rr"
  strictARP: true
Pro Tip: IPVS requires stable connection tracking. If you are hosting on a platform with low limits on nf_conntrack_max, your connections will drop silently when you hit the limit. Check your host's sysctl limits immediately: sysctl net.netfilter.nf_conntrack_max. On CoolVDS NVMe instances, we set this significantly higher by default because we know you aren't running a WordPress blog.

3. The Hardware Reality: Why Neighbor Noise Kills K8s

You can optimize your CNI and switch to IPVS, but you cannot software-engineer your way out of bad hardware. Kubernetes is chatty. Health checks, API server calls, etcd replication, and application traffic generate a massive amount of PPS (Packets Per Second).

On many budget VPS providers, you are fighting for CPU time slices (Steal Time). When the hypervisor pauses your VM to let a noisy neighbor run their crypto miner, your network buffer fills up. When it resumes, the buffer overflows, and packets are dropped. In K8s, this looks like a node flapping NotReady or a random timeout.

We saw this with a client trying to run Elasticsearch on a standard cloud provider. The IO wait was low, but query latency was high. The issue was CPU Steal affecting the network interrupt handling.

We moved them to a CoolVDS NVMe instance. Why? Because KVM (Kernel-based Virtual Machine) provides better isolation than container-based virtualization (like Virtuozzo/OpenVZ) for these workloads. Combined with local NVMe storage, the I/O bottleneck evaporated. The latency from their office in Oslo to the server dropped to the theoretical floor of the fiber optics.

4. Optimizing NGINX Ingress for Throughput

Most of you are using the NGINX Ingress Controller. The default configuration is designed for compatibility, not speed. In 2019, with the rise of heavy JSON APIs, you need to tune the buffers.

If your upstream application response is larger than the NGINX buffer, NGINX writes it to a temporary file on disk. Even on SSDs, this kills performance compared to RAM. You must ensure the proxy-body-size and buffers fit your data.

Here is a snippet from a tuned nginx-configuration ConfigMap:

data:
  proxy-body-size: "10m"
  proxy-buffer-size: "16k"
  worker-processes: "4"
  keep-alive: "75"
  upstream-keepalive-connections: "32"

5. Local Compliance and Latency

We are operating in a post-GDPR world. The Norwegian Datatilsynet is watching. Hosting your Kubernetes cluster on US-managed infrastructure adds a layer of legal complexity regarding data transfer that most CTOs want to avoid.

Beyond the legalities, there is physics. Round-trip time (RTT) from Oslo to Frankfurt is ~25ms. Oslo to Oslo is <2ms. If your application makes multiple sequential database calls, that 20ms difference compounds. A request requiring 10 serial queries takes 250ms extra just in light-travel time if you host abroad.

Final Thoughts

Kubernetes is powerful, but it exposes the weaknesses of the underlying infrastructure. Overlay networks amplify the cost of virtualization. If your host is stealing your CPU cycles, your network performance will suffer regardless of your iptables rules.

Don't let your infrastructure be the bottleneck. If you need a sandbox to test IPVS and Calico BGP without noisy neighbors, spin up a high-performance instance. It takes about 55 seconds.

Deploy your K8s node on CoolVDS today.