Console Login

Untangling the Mesh: A Battle-Hardened Guide to Kubernetes Networking & Performance in 2019

Untangling the Mesh: A Battle-Hardened Guide to Kubernetes Networking & Performance

If I had a krone for every time a developer told me "it connects fine locally" while the production cluster is dropping 3% of packets between pods, I could buy my own rack at the NIX (Norwegian Internet Exchange). Kubernetes networking is not magic. It is a terrifyingly complex layer of virtual ethernet devices, overlay encapsulation, and enough iptables rules to make a kernel panic.

In 2019, deploying Kubernetes (K8s) isn't just about kubectl apply. It's about understanding how packets traverse the abyss between nodes. If you are running K8s on underpowered, oversold VPS hosting, you are effectively introducing latency at the hypervisor level before your packet even hits the CNI. Here is how we debug, optimize, and survive K8s networking in high-load environments.

The CNI Battlefield: Flannel vs. Calico

The first decision you make—often blindly—is the Container Network Interface (CNI). Many tutorials tell you to just install Flannel and move on. For a hobby project, sure. For a business running critical workloads in Oslo, Flannel's VXLAN backend is a CPU cycle thief.

VXLAN encapsulates Layer 2 frames inside Layer 4 UDP packets. This encapsulation and decapsulation process (encap/decap) costs CPU. On a noisy public cloud instance where "vCPUs" are time-sliced aggressively, this leads to micro-stalls. Your database query isn't slow; your network stack is waiting for the CPU to unwrap the packet.

For production, we lean heavily towards Calico using BGP (Border Gateway Protocol) for unencapsulated routing where possible, or IPIP (IP-in-IP) if we must span subnets. It’s leaner and supports Network Policies—essential for GDPR compliance when you need to fence off PII (Personally Identifiable Information) data.

Configuring Calico MTU

A classic "gotcha" I see in audits is MTU (Maximum Transmission Unit) mismatch. If your physical interface is 1500 and you add an overlay header, the inner packet must be smaller. If not, fragmentation occurs, and performance falls off a cliff.

kind: ConfigMap
apiVersion: v1
metadata:
  name: calico-config
  namespace: kube-system
data:
  # If your node interface is 1500, set this to 1480 for IPIP overhead
  veth_mtu: "1480"
  # Enable Typha for scaling beyond 50 nodes to reduce load on the API server
  typha_service_name: "calico-typha"

From iptables to IPVS: The Scalability Bottleneck

By default, kube-proxy uses iptables to handle Service discovery and load balancing. In version 1.15, this is still the standard, but it's fundamentally flawed at scale. iptables is a linear list. If you have 5,000 services, the kernel has to traverse that list sequentially for every packet. It is O(n).

We switched our production clusters to IPVS (IP Virtual Server) mode. IPVS is built on netfilter but uses a hash table for lookups, making it O(1). Whether you have 10 services or 10,000, the lookup time is constant.

To enable this, you must ensure the kernel modules are loaded on your worker nodes before K8s starts. On a CoolVDS instance running CentOS 7 or Ubuntu 18.04, you’d prep the node like this:

# Load necessary kernel modules for IPVS
modprobe ip_vs
modprobe ip_vs_rr
modprobe ip_vs_wrr
modprobe ip_vs_sh

# Verify they are loaded
lsmod | grep ip_vs

Then, update your kube-proxy config map:

apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs"
ipvs:
  scheduler: "rr" # Round Robin
Pro Tip: If you see intermittent connection resets, check your conntrack table limits. sysctl -w net.netfilter.nf_conntrack_max=131072 is often necessary for high-traffic ingress nodes. Standard VPS providers rarely let you tune sysctl parameters this deep; on CoolVDS KVM instances, you have full kernel control.

Debugging DNS Latency (The "5-second delay")

It’s 2019, and CoreDNS has replaced kube-dns, but the "5-second DNS timeout" issue persists on Linux due to race conditions in conntrack when performing Source Network Address Translation (SNAT). If you see random 5-second latencies in your logs, your application is likely hitting this.

The fix involves forcing TCP for DNS or using the single-request-reopen option in your resolv.conf. However, inside K8s, you manage this via the dnsConfig in your Pod spec:

apiVersion: v1
kind: Pod
metadata:
  name: ubuntu-debug
spec:
  containers:
  - name: ubuntu
    image: ubuntu:18.04
    command: ["sleep", "3600"]
  dnsPolicy: "None"
  dnsConfig:
    nameservers:
      - 1.1.1.1
    searches:
      - ns1.svc.cluster.local
      - my.domain.com
    options:
      - name: ndots
        value: "2"
      - name: single-request-reopen

The Hardware Reality: Why "Cloud" Often Fails K8s

Kubernetes assumes low-latency communication between the control plane (etcd) and the worker nodes. Etcd is incredibly sensitive to disk write latency (fsync). If you run a K8s master on a shared hosting platform with noisy neighbors and spinning rust (HDDs), the leader election will time out, and your cluster will flap.

This is where infrastructure choice becomes an architectural decision, not just a procurement one. At CoolVDS, we see clients migrating from massive hyperscalers simply because of I/O Wait. Our NVMe storage arrays provide the predictable IOPS required for etcd to maintain quorum without sweating.

Ingress Tuning for High Throughput

Using the NGINX Ingress Controller? The defaults are too conservative for modern apps. If you are handling file uploads or large JSON payloads, you will hit 413 Request Entity Too Large errors.

Add these annotations to your Ingress object:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: production-ingress
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/proxy-body-size: "50m"
    nginx.ingress.kubernetes.io/proxy-connect-timeout: "15"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "600"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "600"

Data Sovereignty and The Norwegian Context

We cannot ignore the legal landscape. With the Datatilsynet (Norwegian Data Protection Authority) tightening enforcement post-GDPR implementation last year, where your packets flow matters. If your Kubernetes cluster spans regions, you might be inadvertently routing internal traffic through a non-compliant zone.

By keeping your nodes within Norway—specifically on infrastructure directly peered at NIX in Oslo—you ensure two things:

  1. Compliance: Data stays within Norwegian jurisdiction.
  2. Speed: Latency from Oslo to Oslo is sub-2ms. Latency from Oslo to Frankfurt can be 20-30ms. For a microservices architecture with 10 internal hops per request, that difference is the user perceiving your app as "instant" vs. "sluggish."

Final Thoughts

Kubernetes is powerful, but it doesn't absolve you of understanding networking fundamentals. It actually demands you know them better. You need to verify MTU settings, choose the right CNI, and tune your sysctl parameters.

But software tuning can only fix so much. If the underlying hypervisor is choking on packet interrupts, no amount of YAML will save you. You need dedicated resources and raw NVMe performance.

Don't let packet loss define your uptime. Spin up a high-performance, K8s-ready instance on CoolVDS today and see what stable latency actually looks like.