Console Login

Taming the Microservices Hydra: A Guide to Service Mesh in 2017

Taming the Microservices Hydra: A Guide to Service Mesh in 2017

We need to have an honest conversation about microservices. Two years ago, we all started chopping up our monoliths because Netflix told us to. Now, instead of one large application failing, we have thirty small services failing in creative, distributed ways that are impossible to debug. The network is not reliable. Latency is not zero. Bandwidth is not infinite.

If you are running a distributed stack in production today, you know the pain. You deploy a new pricing service, and suddenly the checkout service starts timing out, but only for 5% of users in Trondheim. Why? Who knows. The logs are scattered across ten different containers.

Enter the Service Mesh. It is the buzzword of the moment here in early 2017, but beneath the hype lies a critical architectural pattern: decoupling the application logic from the network logic. In this guide, I will walk you through implementing a basic mesh using Linkerd (the current frontrunner) on Kubernetes, and why your underlying hardware choices—specifically NVMe storage and raw CPU—matter more than ever.

The Architecture: Why a Mesh?

In a traditional setup, your application code contains libraries like Hystrix or Finagle to handle retries, timeouts, and circuit breaking. This is fine if you are a 100% Java shop. But what if your frontend is Node.js, your backend is Go, and your legacy auth system is Python?

You do not want to reimplement retry logic in five different languages. A service mesh pushes this logic into a proxy layer. Your app talks to localhost, and the proxy handles the rest.

Pro Tip: Do not implement a service mesh just because it is trendy. The overhead is real. If you have fewer than 5 microservices, stick to Nginx and Consul. If you have 50, a mesh is mandatory for survival.

The Implementation: Linkerd on Kubernetes 1.5

Right now, Linkerd is the most mature option we have. It is built on Twitter's Finagle, which has been battle-tested at scale. However, it runs on the JVM. This means it eats RAM for breakfast. If you are trying to run this on cheap, oversold VPS hosting where "2GB RAM" actually means "2GB until the host node swaps," you are going to crash.

This is where CoolVDS becomes the reference architecture. Because Linkerd adds latency (JVM GC pauses + network hop), you need to offset that with lightning-fast I/O and dedicated CPU cycles. We see clients deploying Linkerd on our KVM instances in Oslo because the NVMe backing stores handle the logging throughput without blocking the CPU.

Step 1: The Config (linkerd.yaml)

We need to configure how Linkerd routes traffic. In Linkerd, we use dtabs (delegation tables). It's a steep learning curve, but powerful.

admin:
  port: 9990

routers:
- protocol: http
  label: outgoing
  dtab: |
    /svc => /#/io.l5d.k8s/default/http;
  interpreter:
    kind: default
    transformers:
    - kind: io.l5d.k8s.daemonset
      namespace: default
      port: 4140
      service: l5d
  servers:
  - port: 4140
    ip: 0.0.0.0

telemetry:
- kind: io.l5d.prometheus
- kind: io.l5d.recentRequests
  sampleRate: 0.25

Step 2: Deploying the DaemonSet

Instead of a sidecar per pod (which is too heavy with the current JVM version of Linkerd), we will run one Linkerd instance per node using a Kubernetes DaemonSet.

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  labels:
    app: l5d
  name: l5d
spec:
  template:
    metadata:
      labels:
        app: l5d
    spec:
      volumes:
      - name: l5d-config
        configMap:
          name: "l5d-config"
      containers:
      - name: l5d
        image: buoyantio/linkerd:0.8.6
        args:
        - /io.buoyant/linkerd/config/config.yaml
        ports:
        - name: outgoing
          containerPort: 4140
          hostPort: 4140
        - name: admin
          containerPort: 9990
        volumeMounts:
        - name: "l5d-config"
          mountPath: "/io.buoyant/linkerd/config"
          readOnly: true

Step 3: Routing Traffic

Now, your application pods need to send traffic to the Linkerd instance on their node. You can do this by setting the `http_proxy` environment variable in your application deployment:

env:
- name: http_proxy
  value: $(NODE_NAME):4140
- name: NODE_NAME
  valueFrom:
    fieldRef:
      fieldPath: spec.nodeName

Performance Reality Check & The Norwegian Context

Let's talk about the elephant in the room: Latency.

Routing every request through a local proxy adds milliseconds. In a chain of 5 microservices, that adds up. If your servers are hosted in Germany or the Netherlands, but your customers are in Bergen or Oslo, you are already fighting a 30ms latency penalty due to physics.

This is why hosting location matters. CoolVDS infrastructure is peered directly at NIX (Norwegian Internet Exchange). By keeping your compute close to the user, you buy yourself the