Console Login

Kubernetes Networking Deep Dive: Surviving the Overlay Chaos in 2019

Kubernetes Networking Deep Dive: Surviving the Overlay Chaos

Let’s be honest for a second. Everyone loves the idea of Kubernetes. The declarative YAML, the self-healing pods, the promise of infinite scale. But the moment you move from a Minikube instance on your laptop to a production cluster spanning multiple nodes, things get ugly. I'm talking about networking.

I recently spent three sleepless nights debugging a cluster for a fintech client in Oslo. Their microservices were timing out randomly. The logs showed nothing. The application code hadn't changed. The culprit? A misconfigured CNI plugin trying to push VXLAN packets over a cheap VPS provider's already congested network interface.

Kubernetes assumes a flat network where every pod can talk to every other pod. But physics doesn't care about your assumptions. If you don't understand what's happening at Layer 3 and Layer 4, you aren't running Kubernetes; you're just praying to the cloud gods. Today, we are going to dissect the networking stack, look at the state of CNIs in late 2019, and explain why the underlying infrastructure—like the KVM-based instances we engineer at CoolVDS—matters more than your choice of Ingress controller.

The CNI Battlefield: Flannel vs. Calico vs. Weave

In 2019, the Container Network Interface (CNI) landscape is crowded. If you are deploying a cluster today (Kubernetes 1.15), you are likely choosing between Flannel, Calico, or Weave Net.

Flannel is the old reliable. It creates a simple overlay network using VXLAN. It encapsulates your Layer 2 frames inside Layer 4 UDP packets. It's easy to set up, but that encapsulation comes with a CPU cost. On a high-traffic node, the kernel spends a significant amount of time wrapping and unwrapping packets.

Calico, on the other hand, offers pure Layer 3 routing using BGP. No encapsulation (if your network supports it). This is where performance lives. However, many VPS providers block BGP or don't allow the necessary IPIP traffic between instances.

Pro Tip: If you are running on a provider that blocks BGP peering, stick to IP-in-IP encapsulation with Calico. It’s lighter than VXLAN but still requires a stable MTU configuration.

Here is a snippet from a standard calico.yaml manifest we use for production deployments, specifically tuning the MTU to account for the overhead:

kind: ConfigMap
apiVersion: v1
metadata:
  name: calico-config
  namespace: kube-system
data:
  # Typha is disabled.
  typha_service_name: "none"
  # Configure the Calico backend to use.
  calico_backend: "bird"
  # Configure the MTU. Customizing this is critical for overlay performance.
  # Standard Ethernet is 1500. VXLAN overhead is usually 50 bytes.
  veth_mtu: "1450"

The Hidden Killer: MTU Fragmentation

This brings me to the war story. That Oslo fintech client? Their issue was MTU fragmentation. The physical network interface on the host had an MTU of 1500. The overlay network also tried to push 1500-byte packets. Add the VXLAN header, and the packet size jumped to 1550 bytes.

The result? The host OS had to fragment every single packet. Throughput dropped by 40%. Latency spiked.

To diagnose this, you can't rely on kubectl. You need to get onto the node and use tcpdump. Here is the command I use to spot fragmentation issues:

tcpdump -i eth0 -vvn 'icmp and (ip[6:2] & 0x2000 != 0)'

If you see output here, your pods are trying to send packets larger than the wire can handle. You must lower the MTU inside the CNI configuration.

IPVS: The Scalability Shift

Until recently, Kubernetes used iptables to handle Service routing. Every time you created a Service, kube-proxy added iptables rules. In a cluster with 5,000 services, the kernel had to traverse a massive sequential list of rules for every packet. It was slow.

As of Kubernetes 1.11, IPVS (IP Virtual Server) went GA. IPVS uses hash tables instead of linear lists. It is O(1) instead of O(n). In 2019, if you are building a production cluster, you must enable IPVS mode in kube-proxy.

Here is how you verify if your nodes are actually using IPVS:

# Check the kube-proxy logs
kubectl logs -n kube-system -l k8s-app=kube-proxy | grep "Using ipvs Proxier"

# Or check the IPVS table on the node
ipvsadm -Ln

If you see an empty table, you are likely falling back to iptables mode, often because the kernel modules ip_vs, ip_vs_rr, or ip_vs_wrr aren't loaded.

Why Infrastructure Matters: The CoolVDS Difference

You can tune your CNI and enable IPVS, but you cannot software-engineer your way out of bad hardware. Kubernetes is incredibly sensitive to "noisy neighbors." If another customer on your host is hammering the disk I/O or saturating the 10Gbps uplink, your etcd latency will spike. If etcd latency gets too high, the API server starts timing out, and your cluster eventually partitions.

This is why we built CoolVDS differently.

1. True NVMe Isolation

We don't use spinning rust or shared SATA SSDs for our high-performance tiers. We use local NVMe storage. etcd requires low fsync latency (ideally under 10ms at the 99th percentile). On standard cloud block storage, we often see spikes to 50ms+. On CoolVDS NVMe instances, we consistently measure write latency in the microseconds.

2. The Norway Advantage

Latency isn't just about disk; it's about physics. If your users are in Oslo, Bergen, or Trondheim, routing traffic through a data center in Frankfurt or Ireland adds 20-30ms of round-trip time. By hosting on VPS Norway infrastructure, you are physically closer to the Norwegian Internet Exchange (NIX).

3. Data Sovereignty and GDPR

The legal landscape is tightening. With the GDPR in full effect since last year and the Norwegian Data Protection Authority (Datatilsynet) becoming increasingly strict about data transfers, knowing exactly where your bits live is paramount. Hosting on CoolVDS ensures your data remains on Norwegian soil, governed by Norwegian law, not buried in an opaque availability zone owned by a US giant.

Configuring NGINX Ingress for High Loads

Finally, let's talk Ingress. The standard NGINX Ingress Controller is great, but the defaults are conservative. If you are expecting high traffic, you need to tune the ConfigMap.

Here is a battle-tested configuration we deploy for high-throughput API gateways:

apiVersion: v1
kind: ConfigMap
metadata:
  name: nginx-configuration
  namespace: ingress-nginx
  labels:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
data:
  # Increase the body size limit
  proxy-body-size: "50m"
  # Enable keepalives to upstream to reduce TCP handshake overhead
  upstream-keepalive-connections: "100"
  upstream-keepalive-timeout: "30"
  # Optimize buffers for larger payloads
  proxy-buffer-size: "16k"
  # Important for long-lived connections (WebSockets)
  proxy-read-timeout: "3600"
  proxy-send-timeout: "3600"

Conclusion

Kubernetes networking is complex, but it is manageable if you respect the layers. Verify your MTU, use IPVS for scale, and don't underestimate the impact of the physical hardware underneath your virtual nodes.

When you are ready to stop fighting with network latency and start deploying, you need a foundation that respects your engineering standards. CoolVDS offers the raw NVMe power and low-latency network connectivity that production Kubernetes clusters demand.

Don't let slow I/O kill your cluster's heartbeat. Deploy a high-performance KVM instance on CoolVDS today and see the difference single-digit millisecond latency makes.