Kubernetes Networking Deep Dive: eBPF, CNI, and The Hardware Reality
Most "Kubernetes networking tutorials" are dangerous. They show you how to apply a default Flannel manifest and call it a day. Then, six months later, you're waking up at 3:00 AM because your DNS resolution latency hit 200ms and conntrack tables are exhausted.
I've been there. I've debugged clusters where packets vanished into the void because of hairpinned NAT issues. In 2025, running a production cluster without understanding the Container Network Interface (CNI) or the underlying physical transport is professional negligence.
We are going to dissect the stack. No fluff. Just raw packet flows, eBPF maps, and why hosting this on cheap, oversold hardware will kill your control plane.
The Death of iptables and the Rise of eBPF
If you are still relying on pure iptables for K8s service discovery in late 2025, you are bottlenecking your own infrastructure. As services scale, the sequential rule list in iptables grows linearly. O(n) complexity is the enemy of low latency.
The standard nowâand what we run by default on our internal clustersâis Cilium leveraging eBPF (Extended Berkeley Packet Filter). eBPF allows us to run sandboxed programs in the Linux kernel without changing kernel source code or loading modules.
Pro Tip: When deploying K8s on CoolVDS KVM instances, ensure you enable the specific kernel headers. Unlike container-based VPS providers (LXC/OpenVZ) that restrict kernel access, our KVM architecture allows full eBPF loader capabilities required for Cilium.
Replacing Kube-Proxy
The biggest win in 2025 is replacing kube-proxy entirely. Kube-proxy is chatty and slow. Here is how you deploy Cilium to nuke kube-proxy and use eBPF for significantly faster has-based routing:
helm install cilium cilium/cilium --version 1.16.2 \
--namespace kube-system \
--set kubeProxyReplacement=true \
--set k8sServiceHost=${API_SERVER_IP} \
--set k8sServicePort=${API_SERVER_PORT} \
--set bpf.masquerade=true \
--set ipam.mode=kubernetes
Once applied, check the status. If you see "OK" on KubeProxyReplacement, you just saved yourself milliseconds on every service-to-service hop.
kubectl -n kube-system exec ds/cilium -- cilium status --verbose
The "Etcd" Bottleneck No One Talks About
Networking isn't just about moving packets; it's about knowing where to move them. That state lives in etcd. When a pod dies and a new one spins up, the CNI plugin needs to update routes. This requires a write to etcd.
If your disk I/O is slow, etcd chokes. If etcd chokes, the API server hangs. If the API server hangs, your network updates stall. You end up with "zombie routes"âtraffic sent to dead pods.
This is where infrastructure choice dictates success. Etcd requires low fsync latency. Spinning rust or network-attached block storage (ceph) with noisy neighbors will cause leader elections to fail.
We benchmarked this. On standard shared hosting, fdatasync often spikes above 10ms. On CoolVDS High-Frequency Compute, utilizing local NVMe arrays, we consistently see:
# Fio benchmark simulating etcd write patterns
fio --rw=write --ioengine=sync --fdatasync=1 --directory=test-data --size=22m --bs=2300 --name=mytest
Resulting in 99th percentile latencies under 2ms. That is the difference between a network convergence time of 5 seconds and 500 milliseconds.
The Gateway API Standard
The old Ingress resource is technically deprecated for complex routing as of late 2024/early 2025. If you are building for the future, you should be using the Gateway API. It splits the role of infrastructure provider (us) and application developer (you).
Here is a production-ready HTTPRoute that handles traffic splittingâessential for canary deployments without external load balancer logic:
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: production-traffic-split
namespace: backend
spec:
parentRefs:
- name: external-gateway
rules:
- matches:
- path:
type: PathPrefix
value: /api/v2
backendRefs:
- name: backend-v1
port: 8080
weight: 90
- name: backend-v2
port: 8080
weight: 10
This logic happens at the edge of the cluster. But remember: the edge is only as fast as the pipe connecting it.
Norwegian Sovereignty & Network Physics
Let's talk geography. If your users are in Oslo, Bergen, or Trondheim, hosting in Frankfurt is a compromise. Hosting in US-East is an insult.
Latency is determined by the speed of light in fiber, plus the serialization delay of every router in the path. By hosting on CoolVDS in our Oslo facility, you are typically 1-3ms away from the NIX (Norwegian Internet Exchange). This reduces the Round Trip Time (RTT) drastically.
GDPR and Data Sovereignty
Beyond physics, there is the law. With the strict enforcement of Schrems II and the scrutiny of Datatilsynet, moving data outside the EEA (or even outside Norway for certain sectors) is a legal minefield. Keeping traffic local isn't just a performance tweak; it's a compliance requirement.
Kernel Tuning for High-Throughput
Out of the box, the Linux kernel is tuned for general usage, not high-performance Kubernetes networking. If you are pushing gigabits of traffic, you need to adjust your sysctls. We apply these baselines to our managed K8s nodes, but if you run unmanaged VPS, you must do this yourself.
Create a file at /etc/sysctl.d/99-k8s-network.conf:
# Increase the range of ephemeral ports
net.ipv4.ip_local_port_range = 1024 65535
# Enable TCP BBR congestion control (Available in kernels 4.9+, standard in 2025)
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbr
# Increase the maximum number of open files/sockets
fs.file-max = 2097152
# Max backlog for incoming packets (critical for burst traffic)
net.core.netdev_max_backlog = 16384
Apply it:
sysctl -p /etc/sysctl.d/99-k8s-network.conf
Warning: Don't just copy-paste these on a machine with 2GB RAM. TCP buffers consume memory. On a CoolVDS 8GB+ instance, these settings allow you to saturate the 10Gbps uplinks without dropping packets.
Conclusion: Itâs All Connected
Kubernetes networking is a stack of dependencies. Your HTTPRoute depends on the Gateway API, which depends on the CNI, which depends on eBPF, which depends on the Kernel, which depends on the CPU and Storage.
You can have the cleanest YAML in the world, but if your host steals CPU cycles or your storage latency spikes, your network will flap. Don't build a skyscraper on a swamp.
Ready to build a cluster that actually stays up? Deploy a high-frequency NVMe instance on CoolVDS today and see what sub-millisecond latency looks like.