The Network is Always the Problem
It is 3:00 AM. Your pager is screaming. The Kubernetes dashboard shows all green. The pods are Running. Yet, your microservices in the Oslo cluster can't talk to the database in the Trondheim zone. Welcome to the reality of Kubernetes networking.
We like to pretend that Kubernetes abstracts away the hardware. It doesn't. In late 2018, with the release of Kubernetes 1.13 just around the corner, we are seeing more teams migrate legacy monoliths into containers. The result? A massive collision between abstract software-defined networking (SDN) and the hard limits of kernel packet processing. If you are running on a budget VPS with noisy neighbors, you have already lost.
This is a deep dive into what actually happens when Pod A tries to ping Pod B, and how to tune your Linux nodes so they don't choke under load.
The CNI Jungle: Flannel vs. Calico
The Container Network Interface (CNI) is where the magic (and the misery) happens. In the Nordic hosting market, I see too many teams default to Flannel because it is "easy." Flannel typically uses VXLAN to create an overlay network. It encapsulates packets within packets.
This adds overhead. CPU overhead. On a bare-metal giant, you might not feel it. On a virtualized instance, that extra encapsulation processing can steal valuable cycles from your application.
For serious production workloads, I prefer Calico. It uses BGP (Border Gateway Protocol)—the same protocol that runs the internet—to route packets without the encapsulation overhead. It is faster, but it demands a network underneath that allows BGP traffic. This is where your provider matters.
Pro Tip: If you are seeing random latency spikes in Oslo, check your MTU settings. An overlay network adds headers. If your host interface is 1500 bytes and your CNI tries to push 1500 bytes plus headers, packets get fragmented. Fragmentation kills performance.
Configuring Calico for Performance
If you are deploying Calico on a KVM-based system (like CoolVDS), you want to ensure your IP pools are configured correctly to avoid unnecessary NAT. Here is a snippet from a standard calico.yaml manifest I used last week for a fintech client:
apiVersion: projectcalico.org/v3
kind: IPPool
metadata:
name: default-ipv4-ippool
spec:
cidr: 192.168.0.0/16
ipipMode: CrossSubnet
natOutgoing: trueSetting ipipMode to CrossSubnet ensures that encapsulation is only used when crossing subnet boundaries. Within the same L2 segment, it uses native routing. This drastically reduces CPU load on the nodes.
Debug Like You Mean It: Iptables & Conntrack
Kubernetes relies heavily on iptables (or IPVS if you are on the bleeding edge of 1.12) to handle service discovery. When you create a Service, kube-proxy writes rules. Thousands of them.
A common failure mode I see is the conntrack table exhaustion. If you have a high-traffic service opening thousands of short-lived connections, your Linux kernel stops tracking them and starts dropping packets silently. Your users get timeouts. You get no logs.
Check your table usage right now:
sysctl net.netfilter.nf_conntrack_count
sysctl net.netfilter.nf_conntrack_maxIf count is close to max, you are in trouble. On a CoolVDS high-performance instance, we usually tune these kernel parameters in the base image, but if you are rolling your own OS, you need to bump this up in /etc/sysctl.conf:
net.netfilter.nf_conntrack_max = 524288
net.netfilter.nf_conntrack_tcp_timeout_time_wait = 30Reload with sysctl -p. Don't let default settings destroy your availability.
The Storage-Network Link: Etcd Latency
Here is something most tutorials won't tell you: Kubernetes networking state is stored in etcd. Every time a pod dies or an IP changes, etcd updates. Etcd is sensitive to disk latency. If your disk write latency spikes, the cluster heartbeat fails. The master node thinks the worker is dead and starts rescheduling pods. This causes a network storm.
This is why "cheap" VPS hosting is a trap. Spinning rust (HDD) or shared SATA SSDs often cannot keep up with the fsync requirements of etcd.
We benchmarked this recently. Running a 3-node etcd cluster on standard SSDs vs. CoolVDS NVMe instances.
| Storage Type | Fsync Latency (99th percentile) | Cluster Stability |
|---|---|---|
| Standard SATA SSD | 12ms - 45ms | Occasional Leader Elections |
| CoolVDS NVMe | 0.5ms - 2ms | Rock Solid |
When etcd is fast, the network control plane is fast. It is that simple.
War Story: The "Ghost" Latency
Two months ago, a Norwegian e-commerce giant approached us. They were preparing for Black Friday. Their Kubernetes cluster was hosted on a massive US-based cloud provider. They had 120ms latency to their customer base in Oslo. That is unacceptable for high-frequency trading or real-time retail.
We moved their worker nodes to CoolVDS instances in our Nordic datacenter. The physical distance dropped. But we also optimized the sysctl settings for the virtio-net drivers.
# Boosting the network queue for KVM guests
ethtool -G eth0 rx 1024 tx 1024The result? Average response time dropped from 150ms to 25ms. They passed the Datatilsynet (Norwegian Data Protection Authority) audit easily because the data never left the region, satisfying the strict GDPR requirements that came into force back in May.
Why Infrastructure Matters in 2018
Kubernetes is complex enough without fighting your infrastructure. You need raw CPU power for packet encapsulation and insane disk I/O for etcd stability.
CoolVDS isn't just another VPS provider. We build our KVM stacks specifically for engineers who read iptables logs for fun. We don't oversell our CPU cores, and our NVMe storage provides the IOPS required to keep a Kubernetes 1.12 cluster healthy under load.
Stop debugging network ghosts caused by noisy neighbors. Build on a foundation that respects the packet.
Ready to stabilize your cluster? Spin up a high-performance NVMe instance on CoolVDS today and see the difference native speed makes.