Console Login

Taming the Whale: A Realistic Look at Container Orchestration in 2014

Taming the Whale: A Realistic Look at Container Orchestration in 2014

It has been exactly one month since Docker hit version 1.0 at DockerCon. The hype cycle is deafening. Every developer I know is suddenly packaging their applications into containers, handing me a tarball (or pushing to the Hub), and saying, "It works on my machine, just run it."

If only it were that simple. Running docker run on a laptop is one thing. Managing fifty containers across a cluster of VPS instances while maintaining uptime? That is where the headaches start. The ecosystem is fragmented, and "best practices" change weekly.

In this post, we are cutting through the noise. We aren't talking about Google's shiny new "Kubernetes" project that just dropped in alpha—it's too raw for production. We are looking at what works today, right now, for deploying distributed systems on serious infrastructure. We will compare the lightweight distributed init system Fleet (CoreOS) against the battle-tested method of wrapping Docker in Configuration Management (Chef/Puppet/Ansible).

The Problem: The "Pet" vs. "Cattle" Paradigm Shift

We used to treat servers like pets. We named them (Gandalf, Frodo, Bilbo), nursed them to health when they crashed, and manually tweaked their /etc/my.cnf. Containers force us to treat infrastructure like cattle. If a container dies, you don't ssh in to fix it. You shoot it and spin up a new one.

But who holds the gun? And who opens the gate for the new cattle?

Contender 1: The New Kid – CoreOS & Fleet

CoreOS is fascinating. It's a minimal OS designed purely for running containers. It doesn't even have a package manager. No apt-get, no yum. Everything is a container.

The magic glue here is etcd (a distributed key-value store) and Fleet. Fleet treats your entire cluster as if it were a single init system. It extends systemd across the network.

Pro Tip: Never run etcd on a single node in production. You need at least 3 nodes for quorum. If you lose quorum, your cluster becomes read-only and your orchestration dies.

Here is how you actually define a service in Fleet. You create a standard systemd unit file, but you add an [X-Fleet] section.

Example: High-Availability Nginx Unit

Create a file called myapp.service:

[Unit]
Description=My Dockerized Web App
After=docker.service
Requires=docker.service

[Service]
TimeoutStartSec=0
ExecStartPre=-/usr/bin/docker kill myapp
ExecStartPre=-/usr/bin/docker rm myapp
ExecStartPre=/usr/bin/docker pull coolvds/nginx-custom:latest
ExecStart=/usr/bin/docker run --name myapp -p 80:80 coolvds/nginx-custom:latest
ExecStop=/usr/bin/docker stop myapp

[X-Fleet]
# Don't run this on the same machine as another instance of myapp
Conflicts=myapp.service

To deploy this, you don't SSH into a specific server. You sit on your laptop and run:

fleetctl submit myapp.service
fleetctl start myapp.service

Fleet finds an idle machine in your cluster (running on CoolVDS KVM instances, ideally) and schedules the unit there. If that node dies, Fleet can reschedule it elsewhere—if you script the failover logic correctly. It's powerful, but debugging distributed systemd errors can be brutal.

Contender 2: The Old Guard – Ansible / Puppet / Chef

Many of us aren't ready to re-image our entire infrastructure to CoreOS. We are running Ubuntu 14.04 LTS or the brand new CentOS 7. We already have thousands of lines of Puppet code or Ansible playbooks.

The pragmatic approach in 2014 is often to just use Docker as a package format, but let your existing configuration management (CM) tools handle the orchestration. You know exactly where your containers are running.

Example: Managing Docker with Ansible

Instead of relying on a magical scheduler, you define the state. Here is a simple Ansible task to ensure your Redis container is running:

- name: Ensure Redis Container is running
  docker:
    image: redis:2.8
    name: redis_cache
    state: running
    ports:
    - "6379:6379"
    volumes:
    - "/data/redis:/data"
    dns:
    - 8.8.8.8
    - 8.8.4.4

This is deterministic. You know exactly which host runs Redis. The downside? No automatic failover. If the host dies, you get paged at 3:00 AM.

The Underlying Infrastructure: Why Virtualization Matters

Whether you choose Fleet or Ansible, you cannot ignore the layer below. In the VPS market, especially here in Norway, there is a lot of "overselling" happening. Providers stack hundreds of customers onto a single physical server using OpenVZ.

Do not run Docker on OpenVZ.

I cannot stress this enough. Docker relies on kernel features like cgroups and namespaces. On OpenVZ, you are sharing a kernel with every other customer on that host. You will run into version conflicts, module limitations, and potential security leaks. You need Hardware Virtualization (KVM).

When we build infrastructure at CoolVDS, we use KVM exclusively. This gives your Docker host its own private kernel. You can load specific kernel modules (like aufs or btrfs) without begging support to do it for you.

Performance Tuning for 2014 Hardware

Docker creates I/O overhead. The copy-on-write filesystem (Device Mapper or AUFS) can be slow if your underlying storage is spinning rust (HDD). We are seeing more datacenters deploy SSDs, but they are often cached behind RAID controllers that introduce latency.

To check your disk I/O latency, install ioping:

apt-get install ioping

Then run a check against your volume:

# Check I/O latency
ioping -c 10 .

# Check sequential write speed (vital for Docker image pulls)
dd if=/dev/zero of=testfile bs=1G count=1 oflag=direct

If you are seeing latency above 5ms on a "high performance" VPS, you are being throttled. Our benchmark tests on CoolVDS local SSD storage consistently show sub-millisecond latency, which is critical when fifty containers are trying to write logs simultaneously.

Data Sovereignty and The "Datatilsynet" Factor

Since the Snowden revelations last year, the conversation in Oslo has shifted. Norwegian CTOs are nervous about hosting data in US-owned clouds. The EU Data Protection Directive (95/46/EC) is strict, but the Norwegian Personopplysningsloven is even stricter.

Latency is also a legal argument. If your customer base is in Scandinavia, routing traffic through Frankfurt or London adds unnecessary milliseconds. But more importantly, keeping data on Norwegian soil simplifies compliance with Datatilsynet audits.

When you orchestrate containers, you must ensure your scheduler doesn't accidentally spawn a database container in a zone that violates your data handling agreements. Using Fleet's metadata allows you to tag servers by geography:

fleetctl start database.service --requirements "Region=NO-Oslo"

This ensures your sensitive data never leaves the jurisdiction.

Conclusion: Choose Your Weapon

If you are a startup willing to bleed a little, CoreOS and Fleet offer a glimpse into the future of warehouse-scale computing. It is elegant, but complex.

If you are an enterprise that needs stability above all else, stick to Ubuntu 14.04 and Ansible/Chef. Use Docker to solve the "dependency hell," not to redesign your entire network architecture.

Whichever path you choose, the foundation remains the same. You need a KVM-based VPS with pure SSD storage and a network that doesn't choke on packet fragmentation. Do not let your infrastructure be the bottleneck for your brilliant code.

Ready to test your cluster? Deploy a KVM instance on CoolVDS in Oslo today. Spin up time is currently averaging 55 seconds.