Kubernetes Networking in Production: Beyond the Defaults
Most Kubernetes tutorials lie to you. They spin up a minikube instance, deploy an Nginx pod, curl localhost, and call it a day. In the real world—specifically when you are running high-traffic workloads across Northern Europe—the default kube-proxy implementation based on iptables is a bottleneck waiting to strangle your throughput.
I've spent the last month debugging a cluster that was dropping 0.5% of packets randomly. No logs in the application. No errors in the ingress controller. Just silent failures. The culprit? Conntrack table exhaustion on the nodes because the underlying virtualization layer couldn't handle the packet churn. This guide is for those who need their clusters to survive Black Friday, not just a demo day.
The CNI Decision: Why eBPF is Non-Negotiable in 2023
If you are still using Flannel or Weave Net in late 2023 for high-performance production workloads, stop. You are incurring a massive overhead. The industry has shifted toward eBPF (Extended Berkeley Packet Filter). It allows the kernel to process packets without the heavy context switching of traditional iptables rules.
For our Norwegian infrastructure stacks, we standardized on Cilium. Why? Because as your service count grows, iptables rules grow sequentially. O(n) complexity. eBPF is O(1). Whether you have 10 services or 10,000, the lookup time is the same.
Pro Tip: When deploying Cilium on CoolVDS or any KVM-based infrastructure, strictly disable kube-proxy. Let Cilium handle the service mesh. The latency reduction is measurable—often dropping internal service-to-service calls from 2ms to sub-0.5ms.
Configuration for Performance
Here is the exact Helm configuration we use to strip out the legacy overhead. This assumes you are running Kubernetes 1.26+ (current standard as of Oct 2023).
helm install cilium cilium/cilium --version 1.14.2 \
--namespace kube-system \
--set kubeProxyReplacement=true \
--set k8sServiceHost=${API_SERVER_IP} \
--set k8sServicePort=${API_SERVER_PORT} \
--set bpf.masquerade=true \
--set ipam.mode=kubernetes
By setting kubeProxyReplacement=true, we bypass the legacy netfilter path entirely. This is critical for low latency applications, such as real-time financial trading or VoIP services hosted out of Oslo.
The Physical Layer: Where Virtualization Meets Reality
Overlay networks (VXLAN/Geneve) cost CPU cycles. Every packet is encapsulated and decapsulated. If your underlying VPS provider overcommits CPU (stealing cycles from you), your network throughput drops because the kernel literally doesn't have the time to process the interrupt.
This is why we architect CoolVDS differently. We map KVM threads to physical cores strictly. When you push 10Gbps of traffic, you need the CPU to be awake, not waiting for a neighbor to finish their PHP script.
Optimizing the Kernel for Throughput
Before you even install Kubernetes, your base OS (Debian 12 or Ubuntu 22.04) needs tuning. The defaults are set for a desktop, not a router.
# /etc/sysctl.d/99-k8s-network.conf
# Increase the range of ephemeral ports
net.ipv4.ip_local_port_range = 1024 65535
# Allow more connections to be tracked (critical for NAT)
net.netfilter.nf_conntrack_max = 262144
# Reuse TIME_WAIT sockets
net.ipv4.tcp_tw_reuse = 1
# BBR Congestion Control (Standard in 2023 for WAN traffic)
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbr
Apply these with sysctl -p. If you miss the nf_conntrack_max setting, your pods will randomly fail to connect to external APIs (like Stripe or Vipps) during peak loads.
Ingress vs. Gateway API: The 2023 Shift
The Ingress resource is frozen. It's done. As of this year, the Gateway API (now reaching v0.8.0 stability) is the future. It splits the role of "Cluster Operator" (who manages the load balancer) from the "Developer" (who manages the routes).
Why does this matter for hosting in Norway? Compliance. You can enforce TLS policies at the Gateway level (ensuring traffic is encrypted inside the cluster) while letting devs manage their paths. This satisfies strict Datatilsynet requirements regarding data movement.
Example: HTTPRoute with Traffic Splitting
This was a nightmare with Nginx Ingress annotations. With Gateway API, it's structured YAML:
apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
name: checkout-split
namespace: production
spec:
parentRefs:
- name: external-gateway
rules:
- matches:
- path:
type: PathPrefix
value: /checkout
backendRefs:
- name: checkout-v1
port: 8080
weight: 90
- name: checkout-v2
port: 8080
weight: 10
Data Sovereignty & Latency: The NIX Factor
Technically, you can host Kubernetes anywhere. Legally and practically, location dictates success. If your user base is in Scandinavia, routing traffic through Frankfurt adds 20-30ms of round-trip time. That doesn't sound like much until you have a microservices chain where Service A calls Service B, which calls Service C.
Latency compounds.
| Route | Latency (Avg) | Impact on K8s |
|---|---|---|
| Oslo Local (CoolVDS) | < 2ms | Instant etcd sync |
| Oslo to Frankfurt | ~25ms | Slow leader election |
| Oslo to US East | ~90ms | Timeouts frequent |
Furthermore, the Schrems II ruling effectively killed the legality of transferring EU citizen data to US-owned clouds without massive legal gymnastics. Hosting on a sovereign Norwegian provider like CoolVDS simplifies your GDPR compliance stance immediately. Your data stays in Oslo. Period.
The Storage Bottleneck in Networking
Wait, why talk about storage in a networking post? Because Kubernetes networking state lives in etcd. If etcd is slow, your service updates are slow. If a node fails, the network convergence time depends on how fast etcd can write that state change to disk.
We see this constantly: developers pay for "High CPU" instances but get stuck on spinning rust or SATA SSDs. etcd requires low latency fsync. At CoolVDS, we use NVMe storage exclusively. This isn't marketing fluff; it's physics. NVMe queues are parallel; SATA is serial. When your cluster tries to reschedule 50 pods after a node crash, that I/O speed determines if you are down for 5 seconds or 5 minutes.
Debugging the Invisible
When things break (and they will), `kubectl logs` is rarely enough. You need to inspect the wire. We recommend deploying a transient debugging pod attached to the host network.
kubectl run tmp-shell --rm -i --tty --image nicolaka/netshoot -- /bin/bash
Once inside, use `tcpdump` to verify if packets are actually leaving the node. If you see them leave the pod veth pair but not the eth0 interface, you have an iptables/eBPF drop issue.
Kubernetes is powerful, but it assumes your underlying infrastructure is robust. If you are fighting noisy neighbors, ddos attacks, or slow disks, no amount of YAML will save you. Build on a foundation that respects the physics of networking.
Ready to stop fighting latency? Deploy a high-performance, Norway-based K8s node on CoolVDS in under 60 seconds and see the difference NVMe makes.