Kubernetes Networking Deep Dive: Moving Packets Without Losing Sanity
It usually starts the same way. You run minikube start on your laptop, deploy a few pods, and everything feels magical. Service discovery works. The network mesh is invisible. You feel like a cloud-native god.
Then you push to production.
Suddenly, youβre dealing with intermittent DNS failures, 502 Bad Gateways from your Ingress controller, and the dreaded CrashLoopBackOff because etcd timed out writing to disk. Welcome to the reality of Kubernetes networking in 2019. It is not magic. It is a complex layer of iptables rules, encapsulation headers, and routing tables that will punish you for ignoring the underlying infrastructure.
I have spent the last three months debugging a cluster for a fintech client in Oslo. Their issue wasn't code. It was the network overlay choking on cheap VPS hardware. Here is how we fixed it, and how you can avoid the same trap.
The CNI Jungle: Flannel vs. Calico
Kubernetes doesn't handle networking itself; it offloads it to CNI (Container Network Interface) plugins. In 2019, you primarily have two choices for self-hosted clusters: Flannel or Calico.
Flannel is the "easy" button. It creates a simple VXLAN overlay. It works, until you hit high traffic. VXLAN encapsulates packet-in-packet. This adds CPU overhead to every single network request. On a dedicated server, you might not notice. On a noisy, shared VPS with "stolen CPU cycles," this overhead kills latency.
We prefer Calico. It uses BGP (Border Gateway Protocol) to route packets without encapsulation where possible (Layer 3 routing). If you are running on CoolVDS, where the underlying KVM network is robust, you can often disable IPIP encapsulation entirely for raw speed.
Here is a common configuration mistake. Many leave IPIP Mode to Always, even within the same subnet. Change it to CrossSubnet to recover performance:
# calico.yaml snippet
- name: CALICO_IPV4POOL_IPIP
value: "CrossSubnet"
Pro Tip: If you are hosting in Norway to serve Norwegian customers, latency to the NIX (Norwegian Internet Exchange) is critical. Every millisecond of encapsulation overhead negates the benefit of hosting locally. Don't add software latency to a low-latency physical connection.
The Silent Killer: Etcd and Disk I/O
This is technically storage, but it manifests as a networking failure. Kubernetes stores the state of the entire cluster (including network configurations and service endpoints) in etcd.
etcd is extremely sensitive to disk write latency. If fsync takes too long, the cluster leader election fails, nodes go NotReady, and the network map stops updating. In standard VPS environments using spinning HDDs or networked CEPH storage, this is a disaster waiting to happen.
Documentation for Kubernetes 1.15 explicitly warns that you need low latency. We tested this using fio on a standard shared host versus a CoolVDS NVMe instance.
The Benchmark Command:
fio --rw=write --ioengine=sync --fdatasync=1 \
--directory=test-data --size=22m --bs=2300 \
--name=mytest
The Results:
| Metric | Standard VPS (SATA SSD) | CoolVDS (NVMe) |
|---|---|---|
| IOPS | ~450 | ~8,200 |
| 99th Percentile Latency | 12ms | 0.8ms |
| Etcd Stability | Frequent Leader Loss | Stable |
If your wal_fsync_duration_seconds metric exceeds 10ms, your cluster is unstable. This is why we insist on local NVMe storage for KVM instances running Master nodes. It is not about speed; it is about keeping the cluster alive.
Ingress Controllers: Tuning NGINX
Most of you use the nginx-ingress-controller. It is battle-tested. However, the default config is conservative. If you are handling large file uploads or heavy API payloads, you will hit the default body size limits or timeouts.
You don't need to rebuild the image. Use a ConfigMap to inject global configurations. Here is the production configuration we use for high-traffic endpoints:
apiVersion: v1
kind: ConfigMap
metadata:
name: nginx-configuration
namespace: ingress-nginx
labels:
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/part-of: ingress-nginx
data:
proxy-body-size: "50m"
proxy-connect-timeout: "10"
proxy-read-timeout: "120"
proxy-send-timeout: "120"
use-http2: "true"
worker-processes: "4" # Match your CoolVDS vCPU count
Note the worker-processes directive. NGINX is single-threaded per worker. If you have a 4-vCPU CoolVDS instance but leave NGINX defaulting to 1 worker, you are wasting 75% of your computing power for SSL termination.
Security: The "Deny All" Default
With GDPR regulations like the upcoming scrutiny from Datatilsynet, you cannot have a flat network where the frontend can talk directly to the database. That is a compliance nightmare.
By default, Kubernetes allows all pod-to-pod traffic. You must implement a NetworkPolicy. The first thing you should apply to any namespace is a "Deny All" policy, then whitelist traffic.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: production
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
Once this is applied, silence falls. Nothing moves. You then explicitly allow the frontend to talk to the backend on port 8080. This granular control is only possible if your CNI plugin supports it (another reason to choose Calico over basic Flannel).
DNS: The Achilles Heel
In Kubernetes 1.15, CoreDNS is the standard. But have you looked at your ndots configuration? By default, Linux resolves domains by searching through search paths. In K8s, this search list is long (svc.cluster.local, cluster.local, etc.).
If your application code calls curl http://api, the system tries:
api.default.svc.cluster.localapi.svc.cluster.localapi.cluster.local
This triples the load on your DNS pods. To fix this, always use the Fully Qualified Domain Name (FQDN) in your code: http://api.default.svc.cluster.local. It prevents the search walk and reduces latency significantly.
Conclusion: Infrastructure Matters
Kubernetes is software, but it lives on hardware. You cannot overlay a complex distributed system on top of oversold, high-latency hosting and expect stability. The abstractions leak.
When we build clusters for clients, we don't look for the provider with the fanciest UI. We look for KVM virtualization (to prevent neighbor interference), low-latency local NVMe (for etcd health), and clean network pipes.
If you are struggling with unexplainable timeouts or slow API responses, stop looking at your Go code. Look at your packet drops. Look at your disk I/O wait times.
Don't let slow I/O kill your cluster. Deploy a KVM-based, NVMe-backed instance on CoolVDS in 55 seconds and see the difference raw performance makes.