Kubernetes Networking in Production: Surviving the Overlay Maze
Let’s be honest: Kubernetes networking is where the dream of "container orchestration" usually wakes up screaming. You deploy a microservice, the pod status turns green, but `curl` times out. Is it DNS? Is it the iptables rules? Or did you hit a Maximum Transmission Unit (MTU) mismatch between the overlay and the underlay? I have spent more Friday nights debugging packet drops in `kube-proxy` than I care to admit.
If you are running Kubernetes in production in 2020, the "it works on my machine" mentality is a liability. The flat network model Kubernetes mandates is elegant in theory but brutal in practice if you don't understand what's happening under the hood. In this deep dive, we are going to dissect the CNI (Container Network Interface), discuss why `ipvs` beats `iptables` at scale, and look at why hosting this in Norway (specifically on compliant infrastructure like CoolVDS) is no longer just about latency—it's about legal survival.
The Lie of the "Flat Network"
Kubernetes assumes every pod can talk to every other pod without Network Address Translation (NAT). This is the golden rule. But unless you are running on a pure routed network (like BGP with Calico in direct mode), you are likely wrapping packets inside packets. This encapsulation (VXLAN or IPIP) costs CPU cycles. If your underlying VPS infrastructure has "noisy neighbors" stealing your CPU time, that network processing stalls, and your latency spikes.
I recently audited a cluster for a fintech startup in Oslo. They were complaining about random 500ms delays on internal API calls. They blamed the code. I blamed the infrastructure.
Pro Tip: Always check your node's entropy and CPU steal time. Networking is CPU intensive. If%st(steal time) intopis above 2-3%, your network throughput will suffer regardless of your CNI configuration.
Choosing Your Weapon: CNI Plugins in 2020
The Container Network Interface (CNI) is the translation layer between the Kubelet and the network. Choosing the wrong one is a migration nightmare waiting to happen.
| CNI Plugin | Mechanism | Pros | Cons |
|---|---|---|---|
| Flannel | VXLAN | Dead simple. Works everywhere. | No Network Policies. Security is zero. |
| Calico | BGP / IPIP | Enterprise-grade security policies. High performance. | Complex to debug if BGP breaks. |
| Weave Net | Mesh VXLAN | Encryption out of the box. | Can be resource-heavy on large clusters. |
For most serious deployments on CoolVDS, I recommend Calico. Why? Because Network Policies are mandatory for security. You cannot afford to have your front-end web scraper talking directly to your payment database just because they live in the same cluster.
Deploying Calico (The Right Way)
Don't just apply the default manifest blindly. You need to ensure the MTU matches your underlying interface. The standard Ethernet MTU is 1500. VXLAN headers add 50 bytes. If your inner packet is 1500, it fragments, and performance tanks.
# Inspecting the current MTU on your node
ip addr show eth0 | grep mtu
# When applying Calico, you might need to adjust the veth MTU via the ConfigMap
kind: ConfigMap
apiVersion: v1
metadata:
name: calico-config
namespace: kube-system
data:
veth_mtu: "1440" # Safe bet for VXLAN/IPIP overlays
Service Discovery: When DNS Breaks (And It Will)
In Kubernetes 1.18+, CoreDNS is the standard. But CoreDNS is just a Go binary that can be overwhelmed. If you have a PHP application opening a new connection for every request without keep-alive, you will hit the conntrack table limits on the node.
Here is how to quickly diagnose if your Service discovery is failing or if it's the network:
# 1. Run a temporary busybox pod
kubectl run -it --rm --restart=Never debug-pod --image=busybox:1.28
# 2. Inside the pod, test DNS resolution
nslookup kubernetes.default
# 3. If that works, test connectivity to a specific service IP
telnet my-service.default.svc.cluster.local 80
If nslookup takes more than 2 seconds, your CoreDNS pods are likely throttling. This often happens on budget VPS providers where I/O latency slows down the etcd cluster (which CoreDNS relies on). This is why CoolVDS uses NVMe storage by default. The low latency on disk writes keeps etcd happy, which keeps CoreDNS happy, which keeps your production live.
The Proxy Wars: iptables vs. IPVS
By default, kube-proxy uses iptables to handle Service routing. This works fine for 100 services. But when you scale to 5,000 services, iptables becomes a bottleneck because the rules are evaluated sequentially. It is O(n).
In 2020, you should be using IPVS (IP Virtual Server) mode. It uses hash tables—O(1) complexity. It is instant, regardless of cluster size.
To enable this, you need to ensure the kernel modules are loaded on your underlying host before starting Kubernetes. On a CoolVDS KVM instance, you have full kernel control, so this is trivial:
# Load required modules
modprobe -- ip_vs
modprobe -- ip_vs_rr
modprobe -- ip_vs_wrr
modprobe -- ip_vs_sh
modprobe -- nf_conntrack_ipv4
# Verify they are loaded
lsmod | grep -e ip_vs -e nf_conntrack_ipv4
Once the modules are loaded, edit your kube-proxy ConfigMap:
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs"
ipvs:
scheduler: "rr" # Round Robin
strictARP: true
The GDPR & Schrems II Factor
We cannot talk about infrastructure in late 2020 without addressing the elephant in the room: Schrems II. The CJEU ruling in July invalidated the Privacy Shield. If you are storing personal data of Norwegian citizens, relying on US-owned cloud providers has become a massive compliance risk.
Latency is physics, but data sovereignty is law. Hosting your Kubernetes nodes on CoolVDS in Oslo solves both. You get sub-millisecond latency to the Norwegian Internet Exchange (NIX), and you keep the Datatilsynet (Norwegian Data Protection Authority) off your back because the data never leaves the EEA.
Optimizing Kernel Parameters for High Load
Out of the box, Linux is tuned for general desktop use, not high-throughput container networking. If you are pushing thousands of packets per second, you need to tune sysctl.conf.
Here is the configuration I deploy on every worker node:
# /etc/sysctl.d/k8s-tuning.conf
# Increase the range of ephemeral ports
net.ipv4.ip_local_port_range = 1024 65535
# Allow more connections to be tracked
net.netfilter.nf_conntrack_max = 1000000
# Enable IP forwarding (Mandatory for K8s)
net.ipv4.ip_forward = 1
# Reduce TIME_WAIT sockets
net.ipv4.tcp_tw_reuse = 1
# Increase TCP buffer sizes for high-speed networks
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
Apply these with sysctl -p /etc/sysctl.d/k8s-tuning.conf. Small tweaks like tcp_tw_reuse can prevent your application from exhausting sockets during traffic spikes.
Ingress: Exposing the Cluster
Finally, traffic needs to get IN. The NGINX Ingress Controller is still the battle-tested king in 2020. However, don't forget to configure the client_max_body_size if you handle file uploads, otherwise, you will hit the dreaded 413 error.
Here is a snippet for a standard Ingress resource defining a TLS termination:
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
name: production-ingress
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/proxy-body-size: "10m"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
tls:
- hosts:
- api.coolvds-example.no
secretName: tls-secret
rules:
- host: api.coolvds-example.no
http:
paths:
- path: /
backend:
serviceName: backend-service
servicePort: 80
Conclusion
Kubernetes networking is complex, but it is manageable if you respect the layers. Start with a solid foundation. You need KVM virtualization that doesn't overcommit resources, fast NVMe storage for etcd stability, and a network that respects your MTU settings.
At CoolVDS, we don't just sell "VPS hosting." We provide the deterministic performance and legal compliance required for modern DevOps in Norway. Don't let a slow underlay ruin your overlay.
Ready to stabilize your cluster? Deploy a high-performance, compliant KVM instance on CoolVDS today and see the difference raw power makes.