Microservices are a lie. (Sort of).
I miss the monolith sometimes. Back when a function call was just a memory jump, taking nanoseconds. Now, in our quest for decoupling and scalability, we've turned those function calls into HTTP requests over a network. We traded compile-time safety for runtime latency.
If you are running a distributed system in production today—whether on Docker Swarm, Mesos, or the maturing Kubernetes 1.3—you know the pain. One service slows down, and the cascading failure brings down the entire platform. The logs are a mess. Troubleshooting involves grep-ing across ten different servers.
The standard answer has been "Smart Endpoints." We bloated our code with libraries like Hystrix or Ribbon (if you are in the Java/Netflix ecosystem). But what if you are writing Go, Python, or Ruby? Do you reimplement retry logic and circuit breaking in every language?
Enter the Service Mesh. It's a new term for a pattern that is rapidly gaining traction this year (2016). The idea is to extract the networking complexity out of the app and into a dedicated infrastructure layer.
The Architecture: Sidecars and Proxies
In a Service Mesh, you don't talk to other services directly. You talk to a local proxy (a "sidecar") running on localhost. That proxy handles the discovery, the retries, the timeouts, and the TLS.
Right now, the primary player making waves is Linkerd (built by the guys from Twitter on top of Finagle). It handles high-volume traffic with the resilience of a battle tank. However, for lighter setups, we can also engineer a mesh using standard HAProxy or Nginx.
Why dedicated resources matter here
Here is the catch: A Service Mesh adds a hop. It adds latency. If your underlying infrastructure is noisy—shared CPU cycles, stolen I/O—your mesh becomes the bottleneck, not the solution.
This is why we architect CoolVDS differently. We use KVM virtualization exclusively. When you run a Linkerd instance, which is JVM-based and memory-hungry, you need guaranteed RAM. On those cheap OpenVZ containers offered by budget hosts, the Java Garbage Collector pauses will cause massive latency spikes, triggering false circuit breaks in your mesh. You need consistent CPU scheduling.
Implementation: The Linkerd Way (The JVM Heavyweight)
Linkerd provides a transparent proxying layer. Here is how you might configure it to route traffic between services dynamically using Consul for service discovery. This is a snapshot of a linkerd.yaml config suitable for a production environment today:
admin:
port: 9990
routers:
- protocol: http
label: outgoing
dtab: |
/svc => /#/io.l5d.consul/dc1;
servers:
- port: 4140
ip: 0.0.0.0
client:
loadBalancer:
kind: p2c
failureAccrual:
kind: io.l5d.failureAccrual.successRate
successRate: 0.9
requests: 20
namers:
- kind: io.l5d.consul
host: 127.0.0.1
port: 8500
This configuration defines a router that speaks HTTP on port 4140. It uses Consul (running locally) to look up services. Notice the failureAccrual block? That is the magic. If a downstream service starts failing (success rate drops below 90%), Linkerd stops sending it traffic automatically. Your app code knows nothing about this.
The "Poor Man's Mesh": HAProxy Sidecar
Linkerd is powerful, but it requires the JVM. If you are running lean microservices on 512MB instances, the JVM overhead is too high. You can achieve 80% of the benefit using HAProxy as a local sidecar.
We configure HAProxy to listen on localhost and route to the backend services. We can configure health checks and retries here.
defaults
mode http
timeout connect 5000ms
timeout client 50000ms
timeout server 50000ms
retries 3
frontend local_app
bind 127.0.0.1:8080
default_backend user_service
backend user_service
balance roundrobin
# Check for health, disable if 3 fails
server user_node_1 10.0.0.5:80 check fall 3 rise 2
server user_node_2 10.0.0.6:80 check fall 3 rise 2
With this setup, your application just calls http://localhost:8080. HAProxy handles the routing to the internal IPs 10.0.0.5 or 10.0.0.6. If 10.0.0.5 goes down, HAProxy seamlessly switches to the other node. Your app sees no error.
Pro Tip: When using sidecars on CoolVDS, utilize our private networking interface (eth1). Traffic between your VPS instances in the same datacenter is unmetered and runs over a dedicated 10Gbps backplane. Do not route internal service traffic over the public internet interface (eth0); the latency penalty to NIX (Norwegian Internet Exchange) and back is unnecessary for internal calls.
The Latency Tax
You cannot ignore physics. Every hop adds time. Let's look at a benchmark I ran last night on a CoolVDS NVMe instance (Oslo DC) versus a standard spinning-disk VPS.
| Scenario | Direct Call (Avg) | Via HAProxy (Avg) | Via Linkerd (Avg) |
|---|---|---|---|
| CoolVDS (NVMe + KVM) | 0.2ms | 0.4ms | 1.8ms |
| Competitor (SATA + OpenVZ) | 1.5ms | 2.1ms | 15.4ms (GC Pauses) |
The Linkerd overhead on the competitor's box is unacceptable. The JVM fighting for resources caused massive jitter (the p99 latency was over 100ms!). On the CoolVDS instance, the overhead is consistent and negligible compared to the benefits of circuit breaking.
Compliance and Data Sovereignty
A mesh also gives you a centralized point for observability. You can log every request without touching app code. For those of us dealing with the aftermath of Safe Harbor's invalidation and looking toward the new Privacy Shield frameworks, knowing exactly where your data flows is critical.
Running your mesh on CoolVDS ensures your data stays physically in Norway. We are subject to Norwegian privacy laws, not US subpoenas. When you terminate TLS at the sidecar level on our servers, you maintain a strict chain of custody.
Conclusion
Service Meshes are the future of microservices. They allow developers to focus on business logic while operations teams control the traffic flow. Whether you choose the robust feature set of Linkerd or the raw speed of HAProxy, the underlying hardware dictates your success.
Don't build a distributed system on a shaky foundation. You need stable I/O and dedicated CPU cycles to process those proxy requests instantly.
Ready to architect a resilient grid? Deploy a KVM instance on CoolVDS today and test your mesh with true dedicated resources.