Taming the Whale: A 2014 Guide to Docker Orchestration in Production

Let’s be honest: running docker run on your laptop is fun. It’s clean. It’s isolated. But trying to run a multi-node cluster in production right now? It’s an absolute mess. If you are like me, you spent the last six months of 2014 fighting with port mapping, linking containers across hosts, and praying the Docker daemon doesn't hang (again).

The container revolution is here, but the tools to manage it are still in their infancy. While everyone is talking about the new Kubernetes project Google open-sourced back in June, it’s arguably still too alpha for anyone valuing their sleep. So, what do we actually use today to manage distributed containers without losing our minds?

I’ve tested three approaches on CoolVDS NVMe instances to see what holds up under load: the "dumb" Ansible approach, the CoreOS/Fleet method, and the heavy-lifting Apache Mesos.

1. The "Just Use Ansible" Approach

Sometimes, boring is best. If you aren't Google, you might not need a scheduler. You just need configuration management. We rely heavily on Ansible at CoolVDS for our internal tooling because it’s agentless. You can simply wrap your Docker commands in standard playbooks.

The problem isn't starting the container; it's handling the networking when you have a database on Host A and a web worker on Host B. Since Docker links don't span hosts natively yet, we have to manage ports manually.

Here is a snippet from a playbook I used last week to deploy a Redis cluster. Note the explicit port binding to the host interface:

- name: Start Redis Container
  docker:
    name: redis_master
    image: redis:2.8
    state: started
    ports:
    - "6379:6379"
    volumes:
    - /data/redis:/data
  register: redis_container

- name: Update Firewall for Redis
  ufw:
    rule: allow
    port: 6379
    proto: tcp
    src: "{{ web_worker_ip }}"

The Verdict: It works, but it's rigid. If a node dies, Ansible won't automatically reschedule the workload unless you wake up and run the playbook again. For static environments, it's fine. For dynamic scaling, it fails.

2. The "Distributed Systemd": CoreOS & Fleet

This is where things get interesting. CoreOS is stripping Linux down to the bare essentials, and Fleet effectively treats your entire cluster as one giant init system. It’s clever. You write systemd unit files, and Fleet decides where to run them.

We've seen a lot of Norwegian dev teams adopting this because it feels native. You don't learn a new API; you just learn systemd. However, it relies heavily on etcd for consensus, and if your disk I/O latency is high, etcd will fail. This is a common support ticket we see.

Pro Tip: Never run etcd on standard spinning rust (HDD). The fsync latency will cause cluster elections to timeout, splitting your brain. We strictly provision CoolVDS instances with local SSD/NVMe storage for this exact reason. If `iowait` spikes, your cluster dies.

Here is a fleet unit file (myapp.service) that ensures high availability by forcing the service to run on separate machines:

[Unit]
Description=My Nginx App
After=docker.service
Requires=docker.service

[Service]
TimeoutStartSec=0
ExecStartPre=-/usr/bin/docker kill web_app
ExecStartPre=-/usr/bin/docker rm web_app
ExecStartPre=/usr/bin/docker pull coolvds/nginx-custom:1.0
ExecStart=/usr/bin/docker run --name web_app -p 80:80 coolvds/nginx-custom:1.0
ExecStop=/usr/bin/docker stop web_app

[X-Fleet]
Conflicts=myapp.service

The Conflicts directive is powerful. It tells Fleet: "Don't run this unit on a machine that is already running this unit." instant HA.

3. The Heavyweight: Apache Mesos + Marathon

If you are building the next Twitter, you use Mesos. It abstracts CPU, memory, and disk away from the machines. You use Marathon as the framework to launch Docker containers on top of Mesos.

It is robust, but the learning curve is a brick wall. Setting up Zookeeper (required for Mesos) is not a Friday afternoon task. However, once it's running, it is bulletproof. We recently helped a client migrate a high-traffic e-commerce site to Mesos on our infrastructure. They needed to handle the holiday traffic spikes without manual intervention.

Posting a JSON payload to the Marathon API looks like this:

{
  "id": "frontend",
  "cpus": 0.5,
  "mem": 512.0,
  "instances": 3,
  "container": {
    "type": "DOCKER",
    "docker": {
      "image": "coolvds/frontend:v2",
      "network": "BRIDGE",
      "portMappings": [
        { "containerPort": 80, "hostPort": 0, "servicePort": 9000, "protocol": "tcp" }
      ]
    }
  }
}

Notice the "hostPort": 0. Mesos assigns a random port. This necessitates a service discovery mechanism (like HAProxy or Consul) to find where your containers actually landed. Complexity increases, but so does scalability.

Infrastructure Matters: The KVM Difference

No matter which orchestrator you choose—Fleet, Mesos, or standard Ansible—your containers are only as stable as the kernel they share. This is where the "Noisy Neighbor" effect kills performance.

Many providers oversell their container-based VPS (OpenVZ). If another user on the node forks a thousand processes, your Docker containers stall. At CoolVDS, we refuse to do that. We use KVM (Kernel-based Virtual Machine) virtualization. Each customer gets their own kernel.

Why KVM is non-negotiable for Docker in 2014:

Security: Docker 1.3 added security profiles, but escaping a container is still a risk. KVM adds a hard hardware virtualization layer.
Kernel Modules: Want to use specialized networking or storage drivers? You need your own kernel.
Data Sovereignty: With the scrutiny on Safe Harbor and the EU Data Protection Directive, keeping your data on physical hardware located in Oslo (and not replicated to a US cloud) is critical for compliance with the Norwegian Personal Data Act.

When you are running a database inside a container (which I generally advise against, but I know you do it anyway), I/O contention becomes the bottleneck. We benchmarked `fio` random writes on our KVM instances versus standard cloud instances. The difference is stark.

# 4k random write test
fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test \
--filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randwrite

On CoolVDS, we consistently see IOPS capable of sustaining high-transaction etcd or Zookeeper clusters. On budget VPS, the same test often chokes, causing leader election failures in your orchestration layer.

Conclusion

We are in a transition period. Fig (which Docker just acquired to turn into "Compose") is great for dev, but production is still the Wild West. If you need simple, go with Ansible. If you need clustering, Fleet is the modern choice. If you have a team of ten Ops engineers, Mesos is the beast.

But whatever you run, don't run it on weak foundations. Latency kills distributed systems faster than software bugs.

Ready to build a cluster that doesn't flake out? Deploy a high-performance KVM instance on CoolVDS today and get the raw I/O your containers are starving for.

🍪 We Value Your Privacy

Privacy & Cookie Settings

Your Privacy Rights

Taming the Whale: A 2014 Guide to Docker Orchestration in Production

Taming the Whale: A 2014 Guide to Docker Orchestration in Production

1. The "Just Use Ansible" Approach

2. The "Distributed Systemd": CoreOS & Fleet

3. The Heavyweight: Apache Mesos + Marathon

Infrastructure Matters: The KVM Difference

Why KVM is non-negotiable for Docker in 2014:

Conclusion

/// RELATED POSTS

Edge Computing in Norway: Architecting for Sub-5ms Latency in 2025

Kubernetes Networking Deep Dive: Optimizing Packet Flow for Low Latency in 2025

Surviving the Packet Storm: A Deep Dive into Kubernetes Networking & CNI Performance in 2025

Surviving the Millisecond War: Edge Computing Architectures for the Nordic Market

Kubernetes Networking Deep Dive: Why Your Packets Are Dropping in the Overlay

Serverless Without the Handcuffs: Implementing Private FaaS Patterns on High-Performance VDS in 2025