Kubernetes Networking Deep Dive: Surviving the Packet Jungle in 2019
Let’s be honest: Kubernetes networking is where 90% of your production outages are hiding. You can have the most elegant microservices architecture in Oslo, but if your CNI plugin misfires or your Ingress controller chokes on headers, you're just burning cash on idle CPU cycles. I’ve spent the last six months migrating a high-traffic fintech platform from bare metal to K8s v1.15, and the networking layer was the only thing that kept me up at night.
Most VPS providers sell you "cloud" but deliver over-provisioned shared networking that creates unpredictable latency spikes. When you are running a distributed system, latency isn't just an annoyance; it’s a failure state. If your etcd cluster encounters high I/O wait times or network lag, the whole cluster leader election fails. Game over.
This is a deep dive into the plumbing of Kubernetes networking, focusing on stability, performance, and why the underlying hardware—specifically KVM-based virtualization like we use at CoolVDS—is the only way to run this reliably.
The CNI Battlefield: Flannel vs. Calico
Choosing a Container Network Interface (CNI) is the first decision that will haunt you later if you get it wrong. In late 2019, we are seeing two primary contenders for serious production workloads: Flannel and Calico.
Flannel is simple. It uses VXLAN to encapsulate packets. It works, but that encapsulation adds overhead. On a standard VPS with "noisy neighbors," that CPU overhead for packet processing adds up. Suddenly, your high-throughput API is losing 15% performance just wrapping and unwrapping packets.
That is why I prefer Calico for production. It uses BGP (Border Gateway Protocol) to route packets without the encapsulation overhead, assuming your network supports it. It’s raw, fast, and supports Network Policies (which Flannel doesn't out of the box). Here is how we verify our BGP peers are established in a Calico setup:
calicoctl node status
Calico process is running.
IPv4 BGP status
+--------------+-------------------+-------+----------+-------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+--------------+-------------------+-------+----------+-------------+
| 192.168.0.1 | node-to-node mesh | up | 14:20:01 | Established |
| 192.168.0.2 | node-to-node mesh | up | 14:20:03 | Established |
+--------------+-------------------+-------+----------+-------------+
Pro Tip: If you are running on CoolVDS, our KVM architecture exposes the necessary kernel modules to run Calico in pure Layer 3 mode. We don’t block the BGP traffic between your private LAN nodes. Try doing that on a restrictive container-based host.
Service Discovery: When DNS Lies to You
Since Kubernetes 1.13, CoreDNS has been the default, replacing kube-dns. It’s extensible and generally more reliable, but it is extremely sensitive to latency. I recently debugged an issue for a client in Stavanger where their PHP applications were timing out connecting to MySQL. The database was fine. The network was fine. The problem? DNS lookups were taking 3 seconds.
The ndots:5 issue in Linux is usually the culprit. By default, K8s sets search domains in /etc/resolv.conf such that a lookup for google.com triggers multiple internal searches first (google.com.default.svc.cluster.local, etc.) before hitting the external network.
You can mitigate this by optimizing the dnsConfig in your Pod spec:
apiVersion: v1
kind: Pod
metadata:
name: optimization-test
spec:
containers:
- name: test
image: nginx:1.17
dnsConfig:
options:
- name: ndots
value: "2"
This forces the resolver to treat domains with at least 2 dots as absolute, skipping the internal search path loop. It’s a small change that dropped average latency from 150ms to 5ms for external API calls.
Ingress: NGINX is Still King
While Traefik v2 has just dropped and looks promising, the NGINX Ingress Controller remains the battle-tested standard for 2019. It handles the nuances of HTTP/2 and gRPC better than most alternatives right now.
However, the default configuration is rarely enough for a high-traffic site. If you are handling large file uploads or heavy POST bodies, you will hit the dreaded 413 Request Entity Too Large error. You need to inject configuration directly into the NGINX block via annotations.
Here is a snippet from a production Ingress I deployed last week for a media agency:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: media-ingress
annotations:
kubernetes.io/ingress.class: "nginx"
nginx.ingress.kubernetes.io/proxy-body-size: "50m"
nginx.ingress.kubernetes.io/proxy-connect-timeout: "10s"
nginx.ingress.kubernetes.io/proxy-read-timeout: "20s"
# Enable client certificate authentication for admin area
nginx.ingress.kubernetes.io/auth-tls-verify-client: "on"
nginx.ingress.kubernetes.io/auth-tls-secret: "default/ca-secret"
spec:
rules:
- host: api.coolvds-client.no
http:
paths:
- backend:
serviceName: upload-service
servicePort: 80
Security: The GDPR Firewall
In Norway, data sovereignty is critical. Under GDPR, you must ensure that services that shouldn't talk to each other can't talk to each other. Kubernetes has a flat network structure by default—every Pod can talk to every other Pod. That is a security nightmare waiting to happen.
You must implement NetworkPolicies. Think of this as a firewall distributed to every single container. If you are using Calico (as recommended above), this is enforced at the kernel level via iptables or IP sets.
Start with a "Default Deny" policy for every namespace. It breaks everything initially, but it forces you to whitelist only necessary traffic. It’s the only way to be compliant.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: production
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
Once applied, traffic stops. You then explicitly allow traffic, for example, allowing your frontend to talk to your backend API on port 8080:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-frontend-to-backend
namespace: production
spec:
podSelector:
matchLabels:
app: backend
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: frontend
ports:
- protocol: TCP
port: 8080
The Hardware Reality: Why IOPS Matter for K8s
You cannot talk about Kubernetes networking without talking about etcd. Etcd is the brain of your cluster. It stores the state of the entire system. It requires extremely low latency to write data to disk (fsync). If your disk latency spikes, etcd heartbeats fail, the API server becomes unresponsive, and your networking rules (stored in etcd) stop updating.
This is where the "cheap VPS" trap kills projects. Providers using standard SSDs over a shared storage network (SAN) often have fluctuating I/O latency. For a WordPress blog, it's fine. For Kubernetes, it's fatal.
At CoolVDS, we use local NVMe storage with direct passthrough to the KVM instance. We see write latencies in the microseconds, not milliseconds. When you run fio benchmarks on our nodes versus a standard cloud instance, the difference in IOPS is staggering.
| Metric | Standard Cloud VPS (SATA SSD) | CoolVDS (Local NVMe) |
|---|---|---|
| Random Read IOPS (4k) | ~5,000 | ~80,000+ |
| Random Write Latency | 2-5 ms | 0.05 ms |
| Etcd fsync stability | Warning logs common | Rock solid |
The CoolVDS Advantage for Nordic Devs
Latency matters. If your users are in Oslo, Bergen, or Trondheim, routing your traffic through a data center in Frankfurt or Amsterdam adds unnecessary milliseconds. CoolVDS infrastructure is optimized for the Nordic region, peering directly at major exchange points like NIX. This ensures that your `kubectl` commands feel instant and your user traffic takes the shortest path.
Furthermore, because we provide full KVM virtualization, you can load your own kernel modules. Need to enable ip_vs modules to run kube-proxy in IPVS mode for better scalability than iptables? You can do that here:
# On a CoolVDS node, verify IPVS support
lsmod | grep ip_vs
# Load modules if needed
modprobe ip_vs
modprobe ip_vs_rr
modprobe ip_vs_wrr
modprobe ip_vs_sh
Try asking for that level of kernel access on a managed container platform. You won't get it.
Final Thoughts
Kubernetes in 2019 is powerful, but it effectively turns you into a network engineer. You have to care about MTU sizes (especially if tunneling over VPNs), DNS search paths, and BGP routing. But above all, you need a foundation that doesn't shift under your feet.
Don't build your cathedral on a swamp. Ensure your underlying infrastructure guarantees the IOPS and network stability your cluster demands. If you are ready to stop debugging network flakes and start deploying code, it is time to upgrade.
Deploy a high-performance NVMe KVM instance on CoolVDS today and give your Kubernetes cluster the home it deserves.