The Packet Doesn't Lie: Architecting K8s Networking for Performance
Most developers treat Kubernetes networking as magic. You create a Service, the traffic flows, and everyone is happy. Until they aren't. Until the connection tracking table fills up, or you realize your microservices are spending 40% of their time waiting on DNS resolution because of a bottleneck in CoreDNS.
I have spent too many nights staring at tcpdump output on a production cluster, watching packets die silently because of a default configuration. If you are deploying in Norway, relying on the default iptables mode in 2022 is a negligence of duty. Whether you are running a fintech application requiring compliance with Datatilsynet or a high-traffic e-commerce site targeting customers in Oslo, the defaults will kill you.
We are going to rip open the hood. No fluff. Just kernel flags, CNI plugins, and the architectural decisions that separate a toy cluster from a production beast.
1. The CNI Wars: Calico, Flannel, or Cilium?
Your choice of Container Network Interface (CNI) defines your cluster's performance floor. In the early days, Flannel was the go-to because it was simple. But Flannel creates a VXLAN overlay that encrapsulates traffic. Encapsulation means CPU overhead. On a loaded node, that overhead translates to latency.
For 2022 production workloads, you generally have two serious choices:
- Calico: The industry standard. It can run in VXLAN mode, but for raw performance, you want it in BGP mode (Bird). This routes packets natively without encapsulation, assuming your underlying network supports it.
- Cilium: The modern contender using eBPF. It bypasses parts of the standard Linux network stack for massive throughput gains.
If you are hosting on CoolVDS, where you have full KVM isolation and control over your virtual network interfaces, I recommend looking hard at Cilium. Why? Because it replaces kube-proxy entirely.
Configuring Calico for BGP (No Encapsulation)
If you stick with Calico, disable IPIP encapsulation if your nodes share a Layer 2 network. This reduces packet size and CPU usage.
kubectl patch installation default --type=merge -p '{
"spec": {
"calicoNetwork": {
"bgp": "Enabled",
"ipPools": [{
"cidr": "192.168.0.0/16",
"encapsulation": "Never"
}]
}
}
}'
2. Death to iptables: Switch to IPVS
By default, Kubernetes uses iptables to route traffic to Services. This works fine for 50 services. It works okay for 500. But iptables uses a sequential list of rules. It is an O(n) algorithm. Every packet has to be checked against the list until a match is found.
When you scale to thousands of services, latency spikes. The solution is IPVS (IP Virtual Server). IPVS is a kernel-space load balancer that uses hash tables. It is O(1). It doesn't care if you have 10 services or 10,000. The lookup time is constant.
Pro Tip: Before enabling IPVS, ensure your worker nodes have the necessary kernel modules loaded. On a CoolVDS NVMe instance, we leave these available, but some stripped-down cloud kernels remove them.
Load the modules:
modprobe ip_vs
modprobe ip_vs_rr
modprobe ip_vs_wrr
modprobe ip_vs_sh
modprobe nf_conntrack
Edit your Kube-Proxy ConfigMap:
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs"
ipvs:
strictARP: true
scheduler: "rr" # Round Robin
3. The Ingress Bottleneck
Traffic enters your cluster via Ingress. Usually, this is Nginx. The default Nginx configuration in the community Ingress Controller is designed for compatibility, not speed. It creates a lot of short-lived connections to your backend pods.
In a high-latency scenario—say, routing traffic from Northern Norway to a server in Oslo—the TCP handshake overhead adds up. You need to enable keepalives between the Ingress controller and your upstream pods.
Here is a production-ready snippet for your Nginx Ingress ConfigMap:
apiVersion: v1
kind: ConfigMap
metadata:
name: ingress-nginx-controller
namespace: ingress-nginx
data:
# Keep connections to backends open to avoid handshake overhead
upstream-keepalive-connections: "32"
upstream-keepalive-timeout: "10"
upstream-keepalive-requests: "1000"
# Tune buffers for modern payloads
proxy-buffer-size: "16k"
proxy-buffers: "4 32k"
proxy-busy-buffers-size: "64k"
# Norway Specific: HSTS is mandatory for many compliance standards here
hsts: "true"
hsts-max-age: "31536000"
4. Local Nuance: Latency and Sovereignty
In 2022, data sovereignty is the elephant in the room. Following the Schrems II ruling, moving personal data outside the EEA is a legal minefield. Hosting on hyperscalers often routes traffic through Frankfurt or Amsterdam, adding 20-30ms of latency and potential legal headaches.
For Norwegian workloads, physics matters. Light moves at a finite speed. Routing traffic through a local hub like NIX (Norwegian Internet Exchange) in Oslo ensures your latency stays in the single digits (2-5ms) for local users.
However, low latency requires stable I/O. Overlay networks like Calico require CPU cycles to process packets. If your underlying VPS is on oversold hardware with "noisy neighbors" stealing CPU cycles, your network throughput will jitter. This is why we built CoolVDS on pure KVM with dedicated resource allocation. We don't overprovision CPU, so your network stack gets the cycles it needs to process packets instantly.
5. Kernel Tuning for High Throughput
Linux defaults are set for general-purpose computing, not high-throughput packet switching. You need to tune sysctl.
If you see nf_conntrack: table full in your logs, your server is dropping packets because it can't track new connections. Increase the limits.
# /etc/sysctl.d/k8s-net.conf
# Increase connection tracking table
net.netfilter.nf_conntrack_max = 131072
# Reuse TIME-WAIT sockets
net.ipv4.tcp_tw_reuse = 1
# Increase the range of ephemeral ports
net.ipv4.ip_local_port_range = 1024 65535
# BBR Congestion Control (Great for unstable mobile networks)
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbr
Apply with sysctl -p /etc/sysctl.d/k8s-net.conf.
Conclusion: Verify Everything
Don't assume your network is working. Verify it. Run iperf3 between pods on different nodes. If you aren't getting near line-speed of your interface, check your CNI encapsulation and your MTU settings (usually 1450 for VXLAN, 1500 for direct).
Kubernetes networking is brittle if you ignore the details. But if you tune the kernel, choose the right CNI, and run it on hardware that respects your resource needs, it is bulletproof.
Stop fighting latency on oversold clouds. Spin up a KVM-isolated, NVMe-backed instance in Oslo with CoolVDS and see what your network stack is actually capable of.