Kubernetes Networking Deep Dive: Surviving the Packet Drop Nightmare
I still wake up in a cold sweat thinking about a deployment I managed back in 2023. We had a perfectly stateless microservices architecture, automated scaling, and a CI/CD pipeline that was a work of art. Yet, every day at 14:00 Oslo time, our checkout service latency spiked from 40ms to 4 seconds. No logs in the application. No CPU spikes on the pods.
The culprit? Soft IRQ throttling on the host node because the underlying cloud provider was overselling their CPU cycles. The virtual interface couldn't process packets fast enough.
Kubernetes networking is hard. It is an abstraction built on top of an abstraction, relying on Linux kernel primitives that were designed decades ago. In 2025, we have better toolsâeBPF is mainstream, and the Gateway API has finally maturedâbut the physics of networking haven't changed. If your underlying infrastructure is garbage, your Kubernetes cluster is just a complex way to generate timeout errors.
The CNI Decision: Why iptables is Dead to Me
For years, kube-proxy using iptables was the default. It works fine for small clusters. But when you scale to hundreds of services, iptables becomes a linked-list nightmare. Every packet has to traverse a linear list of rules. It is O(n) complexity. It is slow.
In 2025, if you are building a production cluster in Norwayâwhere users expect instant interactionâyou should be using a CNI based on eBPF (Extended Berkeley Packet Filter). We rely heavily on Cilium for this.
eBPF allows the kernel to run sandboxed programs in response to events (like a packet arriving). It bypasses the bloated iptables chain entirely. This reduces latency and, crucially, CPU overhead.
Configuring Cilium for Performance
Don't just install the defaults. Here is a production-ready values.yaml snippet we use to squeeze maximum performance out of our CoolVDS instances:
cluster:
name: "coolvds-oslo-01"
id: 1
kubeProxyReplacement: "true"
l7Proxy: true
hub:
relay:
enabled: true
ui:
enabled: true
ipam:
mode: "kubernetes"
# Enable BPF masquerading for performance
bpf:
masquerade: true
# Bypass the standard localized stack for faster routing
hostRouting: true
# Direct Server Return (DSR) to preserve client source IP and reduce hops
loadBalancer:
mode: "dsr"
Pro Tip: Enabling hostRouting: true allows Cilium to bypass the upper networking stack of the host entirely. However, this requires a kernel version 5.10 or higher. At CoolVDS, our standard KVM images run the latest 6.x kernels, ensuring this feature actually works out of the box.
The Shift to Gateway API
We spent years hacking NGINX Ingress Controllers to do things they weren't meant to do. The Gateway API (gateway.networking.k8s.io/v1) is the standard now. It separates the role of the Infrastructure Provider (us, setting up the load balancers) from the Application Developer (you, defining routes).
Here is how you define a clean HTTP route in 2025, avoiding the messy annotations of the past:
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: payment-route
namespace: payments
spec:
parentRefs:
- name: public-gateway
namespace: gateway-infra
hostnames:
- "api.coolvds-customer.no"
rules:
- matches:
- path:
type: PathPrefix
value: /v1/charge
backendRefs:
- name: payment-service
port: 8080
weight: 100
filters:
- type: RequestHeaderModifier
requestHeaderModifier:
add:
name: X-Region
value: "no-oslo-1"
This structure is declarative and portable. It also integrates better with Datatilsynet requirements for data flow control, as you can strictly define header modifications and traffic splitting at the infrastructure edge.
The Hardware Reality: Why Your "Cloud" is Slow
Software-defined networking requires CPU cycles. When a packet hits your node, an interrupt is fired. The CPU stops what it's doing to handle that packet. If you are on a noisy, over-provisioned VPS, your "vCPU" is fighting for time on the physical core.
This leads to Network Steal Time. You won't see it in the application logs. You will only see high latency.
We benchmarked this. We ran a standard iperf3 test between two pods on different nodes. One set was on a budget "cloud" provider, the other on CoolVDS High-Frequency Compute instances.
| Metric | Budget VPS | CoolVDS (NVMe + Dedicated) |
|---|---|---|
| Throughput | 1.2 Gbits/sec (variable) | 9.8 Gbits/sec (stable) |
| Retransmits | 142 | 0 |
| Latency (P99) | 18ms | <1ms |
The zero retransmits on CoolVDS aren't magic. It's simply because we don't oversubscribe the bus bandwidth. When you are pushing gigabits of traffic for a Kubernetes cluster, you need the underlying I/O to support it.
Debugging Network Policies
Security is not optional. The default Kubernetes policy is "allow all." In a GDPR-heavy environment like Norway, that is negligence. You need NetworkPolicies. But debugging them is painful. Is it the firewall? Is it the policy? Is it DNS?
Here is a troubleshooting workflow using netshoot, a crucial container utility:
# 1. Spin up a temporary debug pod in the namespace giving you trouble
kubectl run tmp-shell --rm -i --tty --image nicolaka/netshoot -- /bin/bash
# 2. Inside the pod, test connectivity with specific timeouts
nc -z -v -w 2 payment-service 8080
# 3. Check DNS resolution times (often the hidden killer)
dig payment-service.payments.svc.cluster.local +stats
If nc times out, check your policy. A standard "deny-all" default policy looks like this:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: payments
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
Once applied, nothing enters or leaves. You must explicitly whitelist traffic. We often see developers forget to whitelist DNS traffic (UDP port 53) to CoreDNS. Result: everything breaks mysteriously.
Local Latency Matters
For Norwegian businesses, hosting your K8s cluster in Frankfurt or Amsterdam adds 15-30ms of round-trip time. For a real-time application or a high-frequency trading bot, that is an eternity.
By placing your nodes in Oslo, peering directly at NIX, you drop that latency to 1-3ms for local users. CoolVDS infrastructure is physically located to optimize this path. We treat the network route to the Norwegian backbone as a critical component, not an afterthought.
Final Thoughts
Kubernetes networking is less about magic and more about removing bottlenecks. You remove the iptables bottleneck with eBPF. You remove the ingress bottleneck with Gateway API. And you remove the physical bottleneck by choosing a provider that respects hardware limits.
Don't let packet drops determine your uptime. Build on a foundation that can handle the load.
Ready to see what your cluster can really do? Deploy a CoolVDS NVMe instance in Oslo today and benchmark the difference yourself.