Taming the Microservices Hydra: A Guide to Service Mesh in 2017
We need to have an honest conversation about microservices. Two years ago, we all started chopping up our monoliths because Netflix told us to. Now, instead of one large application failing, we have thirty small services failing in creative, distributed ways that are impossible to debug. The network is not reliable. Latency is not zero. Bandwidth is not infinite.
If you are running a distributed stack in production today, you know the pain. You deploy a new pricing service, and suddenly the checkout service starts timing out, but only for 5% of users in Trondheim. Why? Who knows. The logs are scattered across ten different containers.
Enter the Service Mesh. It is the buzzword of the moment here in early 2017, but beneath the hype lies a critical architectural pattern: decoupling the application logic from the network logic. In this guide, I will walk you through implementing a basic mesh using Linkerd (the current frontrunner) on Kubernetes, and why your underlying hardware choices—specifically NVMe storage and raw CPU—matter more than ever.
The Architecture: Why a Mesh?
In a traditional setup, your application code contains libraries like Hystrix or Finagle to handle retries, timeouts, and circuit breaking. This is fine if you are a 100% Java shop. But what if your frontend is Node.js, your backend is Go, and your legacy auth system is Python?
You do not want to reimplement retry logic in five different languages. A service mesh pushes this logic into a proxy layer. Your app talks to localhost, and the proxy handles the rest.
Pro Tip: Do not implement a service mesh just because it is trendy. The overhead is real. If you have fewer than 5 microservices, stick to Nginx and Consul. If you have 50, a mesh is mandatory for survival.
The Implementation: Linkerd on Kubernetes 1.5
Right now, Linkerd is the most mature option we have. It is built on Twitter's Finagle, which has been battle-tested at scale. However, it runs on the JVM. This means it eats RAM for breakfast. If you are trying to run this on cheap, oversold VPS hosting where "2GB RAM" actually means "2GB until the host node swaps," you are going to crash.
This is where CoolVDS becomes the reference architecture. Because Linkerd adds latency (JVM GC pauses + network hop), you need to offset that with lightning-fast I/O and dedicated CPU cycles. We see clients deploying Linkerd on our KVM instances in Oslo because the NVMe backing stores handle the logging throughput without blocking the CPU.
Step 1: The Config (linkerd.yaml)
We need to configure how Linkerd routes traffic. In Linkerd, we use dtabs (delegation tables). It's a steep learning curve, but powerful.
admin:
port: 9990
routers:
- protocol: http
label: outgoing
dtab: |
/svc => /#/io.l5d.k8s/default/http;
interpreter:
kind: default
transformers:
- kind: io.l5d.k8s.daemonset
namespace: default
port: 4140
service: l5d
servers:
- port: 4140
ip: 0.0.0.0
telemetry:
- kind: io.l5d.prometheus
- kind: io.l5d.recentRequests
sampleRate: 0.25
Step 2: Deploying the DaemonSet
Instead of a sidecar per pod (which is too heavy with the current JVM version of Linkerd), we will run one Linkerd instance per node using a Kubernetes DaemonSet.
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
labels:
app: l5d
name: l5d
spec:
template:
metadata:
labels:
app: l5d
spec:
volumes:
- name: l5d-config
configMap:
name: "l5d-config"
containers:
- name: l5d
image: buoyantio/linkerd:0.8.6
args:
- /io.buoyant/linkerd/config/config.yaml
ports:
- name: outgoing
containerPort: 4140
hostPort: 4140
- name: admin
containerPort: 9990
volumeMounts:
- name: "l5d-config"
mountPath: "/io.buoyant/linkerd/config"
readOnly: true
Step 3: Routing Traffic
Now, your application pods need to send traffic to the Linkerd instance on their node. You can do this by setting the `http_proxy` environment variable in your application deployment:
env:
- name: http_proxy
value: $(NODE_NAME):4140
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
Performance Reality Check & The Norwegian Context
Let's talk about the elephant in the room: Latency.
Routing every request through a local proxy adds milliseconds. In a chain of 5 microservices, that adds up. If your servers are hosted in Germany or the Netherlands, but your customers are in Bergen or Oslo, you are already fighting a 30ms latency penalty due to physics.
This is why hosting location matters. CoolVDS infrastructure is peered directly at NIX (Norwegian Internet Exchange). By keeping your compute close to the user, you buy yourself the