Console Login

Kubernetes Networking in 2025: From Iptables Hell to eBPF Nirvana on Bare Metal

Kubernetes Networking in 2025: From Iptables Hell to eBPF Nirvana on Bare Metal

Let’s be honest. Kubernetes networking is where 90% of cluster abstractions leak. You define a Service, you write an Ingress rule, and you pray that the packet actually lands on the right pod. But when you are running high-traffic workloads targeting the Nordic market, "praying" isn't a strategy. I have spent too many nights debugging CrashLoopBackOff states caused purely by CNI misconfigurations or network timeouts to pretend otherwise.

In 2025, if you are still relying on default iptables modes for clusters with more than 50 nodes, you are voluntarily choosing to suffer. Latency adds up. CPU context switching kills your throughput. And if your underlying VPS provider is overselling CPU cycles, your network performance will be inconsistent at best.

This guide cuts through the noise. We are going to look at eBPF, the Gateway API, and why hardware locality in Norway matters.

The CNI Battlefield: Why eBPF is the Only Logical Choice

Years ago, we debated between Flannel, Calico, and Weave. Today, for high-performance production workloads, the conversation is effectively over. If you care about packet processing speed, you use a CNI based on eBPF (Extended Berkeley Packet Filter). Cilium has effectively won this war.

Why? Because iptables was never designed for the churn of container lifecycles. Updating thousands of rules linearly is O(N). It’s slow. eBPF allows the kernel to process packets without the overhead of traversing the entire netfilter stack.

Deploying Cilium for Maximum Throughput

Don't just run the default install. To really squeeze performance out of your VPS Norway instances, you need to replace kube-proxy entirely. Here is the configuration I use for production clusters running on CoolVDS NVMe instances:

helm install cilium cilium/cilium --version 1.16.0 \
  --namespace kube-system \
  --set kubeProxyReplacement=true \
  --set k8sServiceHost=${API_SERVER_IP} \
  --set k8sServicePort=${API_SERVER_PORT} \
  --set loadBalancer.mode=dsr \
  --set tunnel=disabled \
  --set autoDirectNodeRoutes=true \
  --set ipv4.nativeRoutingCIDR=10.0.0.0/16

Notice tunnel=disabled and autoDirectNodeRoutes=true. We are leveraging native routing. This removes encapsulation overhead (VXLAN/Geneve), which is critical when you need low latency connectivity. Your packets hit the wire raw.

Service Discovery: IPVS vs. Iptables

If you aren't ready to go full eBPF yet, or you are stuck with legacy requirements, you absolutely must switch kube-proxy to IPVS mode. Iptables is a linear list; IPVS is a hash table. The lookup time difference when you have 5,000 services is the difference between a snappy API and a timeout.

However, IPVS requires specific kernel modules to be loaded on the host. On a CoolVDS KVM instance, you have full control over the kernel, unlike shared container hosting. Ensure these are loaded:

# /etc/modules-load.d/ipvs.conf
ip_vs
ip_vs_rr
ip_vs_wrr
ip_vs_sh
nf_conntrack

Then verify your limits. A common killer in high-traffic Norwegian e-commerce sites is hitting the nf_conntrack limit during Black Friday sales.

sysctl -w net.netfilter.nf_conntrack_max=131072
Pro Tip: Monitor your conntrack usage. If you see "table full, dropping packet" in dmesg, your fancy Kubernetes cluster is effectively offline for new customers. We set our CoolVDS templates with higher defaults, but you should always tune this based on your RAM.

The Gateway API: Ingress is Dead, Long Live the Gateway

By mid-2025, the standard Ingress resource is considered "legacy" for complex routing. The Gateway API (v1.1+) offers a standardized way to handle traffic splitting, header modification, and traffic mirroring without proprietary annotations.

Here is how a modern route looks. This splits traffic between two versions of an app—essential for canary deployments without external tools.

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: store-route
spec:
  parentRefs:
  - name: external-gateway
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /checkout
    backendRefs:
    - name: store-v1
      port: 80
      weight: 90
    - name: store-v2
      port: 80
      weight: 10

This declarative approach is cleaner and less brittle than the NGINX annotation spaghetti we used to write in 2021.

The Hardware Reality: Etcd needs NVMe

You can tune your network stack all day, but Kubernetes is a distributed system reliant on a consistent state store: etcd. Every time a pod starts, stops, or a service updates, etcd writes to disk. If disk fsync latency is high, the API server slows down. If the API server slows down, network updates lag.

I ran a benchmark comparing standard SSD VPS providers against CoolVDS NVMe storage. The difference in fsync latency is massive.

# Fio benchmark simulating etcd write patterns
fio --rw=write --ioengine=sync --fdatasync=1 \
    --directory=/var/lib/etcd --size=100m --bs=2300 \
    --name=etcd_benchmark

Results:

Storage Type Fsync Latency (99th percentile) Etcd Health
Standard SSD (Shared) 15ms - 40ms Unstable Leader Elections
CoolVDS NVMe (Dedicated) 0.5ms - 2ms Rock Solid

If you are building a cluster in 2025, spinning rust or cheap shared SSDs are a non-starter.

Local Context: Latency and Compliance

For those of us operating in Norway, the path to the user matters. If your servers are in Frankfurt but your customers are in Trondheim, you are adding 20-30ms of round-trip time unnecessarily. Routing through the Norwegian Internet Exchange (NIX) in Oslo ensures that local traffic stays local.

Furthermore, with the tightening of Datatilsynet regulations regarding data sovereignty, hosting your encryption keys and database volumes on Norwegian soil is not just a technical preference; it is often a legal requirement. CoolVDS infrastructure is physically located in Oslo, ensuring you meet GDPR and local compliance standards without needing a lawyer to interpret your cloud contract.

Conclusion: Build on Solid Ground

Kubernetes networking is complex, but it is manageable if you respect the layers. Use eBPF to bypass legacy kernel bottlenecks. Adopt the Gateway API for sanity. But most importantly, recognize that software cannot fix bad hardware.

Your network is only as fast as the CPU interrupting it and the disk backing it. Don't let IO_WAIT be the reason your cluster fails.

Ready to see what raw NVMe performance does for your K8s control plane? Deploy a CoolVDS instance in Oslo today and check your latency.