The Container Lie: Why Shared Kernels Keep Me Awake at Night
It is February 2014, and my Twitter feed is nothing but "Docker this" and "LXC that." Don't get me wrong—I love the idea of shipping code faster than we can brew coffee. Docker 0.8 just dropped with OS X support, and the buzz is deafening. But as someone who has been managing servers since the days when we had to compile our own kernels just to get decent iptables support, I have a bone to pick with the "containerize everything" crowd.
Containers do not contain. At least, not yet. Not in the way a hypervisor does.
If you are running a mission-critical application for a client in Oslo on a cheap VPS that uses OpenVZ or bare-metal LXC, you are playing Russian Roulette with your data. I recently audited a setup for a Norwegian e-commerce giant where a developer had deployed a Docker instance with default settings. They thought they were isolated. I showed them how easy it was to crash the host kernel from inside their "secure" container. The look on the CTO's face was priceless.
The Anatomy of the Threat: Namespaces are Not Enough
Linux Containers (LXC)—and by extension, the current Docker engine—rely on kernel namespaces and cgroups (control groups). Namespaces lie to the process about what it can see (PIDs, mounts, network), and cgroups limit what it can use (CPU, RAM). This is brilliant for efficiency, but it is not a security boundary.
The biggest issue we face right now in early 2014 is the UID 0 mapping. By default, root inside the container is root on the host. If a process escapes the chroot (which happens more often than we'd like to admit), it has full run of the server. While user namespaces are making their way into the upstream kernel (thanks to the work in Linux 3.8+), they are painful to configure correctly in most distros like Ubuntu 12.04 LTS or CentOS 6.5.
The "War Story": When a Neighbor Melts Your Database
Last month, I was debugging a MySQL performance issue on a client's legacy VPS. They were hosting on a budget provider using OpenVZ (a container-based virtualization). Their database latency spiked every day at 14:00. We dug into the metrics.
The issue wasn't their code. It was a "noisy neighbor" on the same physical host running a massive video transcoding job. Because containerization shares the kernel's scheduler, strict isolation is difficult to guarantee compared to hardware virtualization. The host kernel was spending so much time context-switching for the neighbor that my client's I/O requests were queued. We moved them to a CoolVDS KVM instance that afternoon, and the latency flatlined immediately. Why? Because KVM (Kernel-based Virtual Machine) uses hardware virtualization extensions (Intel VT-x), giving you a dedicated kernel and reserved resources.
Hardening LXC: If You Must Use Containers
If you are committed to the container path (and with LXC 1.0 approaching stable release, many are), you must harden your configuration manually. Do not trust the defaults.
1. Drop Capabilities
The Linux kernel divides root privileges into distinct units called capabilities. A web server does not need to load kernel modules or manipulate the system clock. Drop them.
In your LXC config file (usually found in /var/lib/lxc/container-name/config), add the following:
lxc.cap.drop = sys_module
lxc.cap.drop = sys_admin
lxc.cap.drop = sys_nice
lxc.cap.drop = sys_pacct
lxc.cap.drop = sys_rawio
lxc.cap.drop = sys_time
This ensures that even if an attacker gains root inside the container, they cannot insert a malicious kernel module to hide their tracks.
2. Limit Resources with Cgroups
Don't let a runaway process fork-bomb your host. Set strict limits in your configuration. This is crucial if you are adhering to strict SLAs.
# Limit memory to 512MB
lxc.cgroup.memory.limit_in_bytes = 536870912
# Limit memory + swap to 1GB
lxc.cgroup.memory.memsw.limit_in_bytes = 1073741824
# CPU shares (default is 1024)
lxc.cgroup.cpu.shares = 512
3. AppArmor is Your Friend
On Ubuntu systems, AppArmor is mandatory. Docker 0.7+ started shipping with a default profile, but for raw LXC, you need to ensure the profile is loaded. Verify your container status:
sudo lxc-info -n my-web-server
# Output should show state and PID
# Check if AppArmor is enforcing
sudo apparmor_status | grep lxc
Pro Tip: If you are serving data to Norwegian users, latency matters. The NIX (Norwegian Internet Exchange) in Oslo handles the bulk of local traffic. Ensure your host has direct peering or low hops to NIX. Security is useless if your site takes 300ms to load the first byte.
The Architecture of Trust: Containers inside KVM
Here is the architecture I recommend to every CTO I speak with: Nest your containers inside a KVM VPS.
This gives you the best of both worlds:
- DevOps Agility: You can still use Docker or LXC to manage your application dependencies and deployment speed.
- Hard Security Boundary: The KVM hypervisor acts as the containment wall. If a container breaks out, it only breaks out into the Virtual Machine, not the physical host.
This approach is essential for compliance with the Norwegian Personal Data Act (Personopplysningsloven) and the EU Data Protection Directive. Datatilsynet (The Norwegian Data Protection Authority) takes a dim view of shared data environments where strict logical separation cannot be proven. With KVM, you can point to the hypervisor and say, "This is my dedicated kernel; nobody else touches it."
Network Isolation Example
When running Docker/LXC inside your CoolVDS KVM instance, use iptables to strictly control what traffic enters the container bridge. Don't rely on the Docker daemon's automatic iptables manipulation blindly.
# Flush existing rules
iptables -F
# Default drop policy
iptables -P INPUT DROP
iptables -P FORWARD DROP
# Allow established connections
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
# Allow SSH (change 22 to your custom port!)
iptables -A INPUT -p tcp --dport 22 -j ACCEPT
# Allow web traffic
iptables -A INPUT -p tcp --dport 80 -j ACCEPT
# Forwarding for the docker0 bridge
iptables -A FORWARD -i docker0 -o eth0 -j ACCEPT
iptables -A FORWARD -i eth0 -o docker0 -m state --state ESTABLISHED,RELATED -j ACCEPT
Why I Choose CoolVDS for this Stack
We are systems engineers, not gamblers. When I provision a server, I look for two things: True Virtualization and I/O Performance.
CoolVDS runs exclusively on KVM. They don't oversell via OpenVZ. When you buy 2 vCPUs, you get those cycles. More importantly, in 2014, disk I/O is the bottleneck. Spinning rust (HDDs) kills database performance. CoolVDS has deployed enterprise-grade SSDs across their fleet. When you run `iotop` on their instances, you actually see the throughput you paid for.
The container revolution is exciting, but don't let the hype compromise your security. Wrap that Docker container in a KVM instance, configure your cgroups, and sleep better at night.
Ready to secure your stack? Deploy a KVM-based SSD VPS on CoolVDS today and get the isolation your architecture demands.