Console Login

Container Security in 2014: Why Shared Kernels Keep Me Awake at Night

Container Security in 2014: Why Shared Kernels Keep Me Awake at Night

It has been exactly one month since the Heartbleed bug (CVE-2014-0160) tore through the internet, exposing private keys and shattering our trust in OpenSSL. If that wasn't enough to make you paranoid, the sudden explosion of Docker—which just hit version 0.11 this May—should be. Don't get me wrong, I love the idea of shipping code in standardized units. But as a systems architect looking at the Norwegian hosting market, I see a dangerous trend: developers treating containers like full Virtual Machines. They aren't.

When you spin up an LXC container or a Docker instance, you are not getting a virtualized hardware layer. You are getting a slice of the host's kernel using cgroups and namespaces. If that kernel has a vulnerability, or if your container privileges aren't clamped down with AppArmor, you aren't just compromising one app—you're handing over the keys to the entire host. In this guide, we are going to look at how to secure these environments without sacrificing the performance we crave.

The "Root" of the Problem

By default, processes inside a container often run as root. In a true KVM environment—like what we enforce for all instances at CoolVDS—this is contained within the virtual machine's kernel. In a container, root inside can theoretically become root outside if there is a breakout exploit. We saw proof of concepts for this with simple capability leaking earlier this year.

If you are managing your own bare metal or VPS and deciding to layer Docker on top, you need to verify your kernel configuration immediately. Do not assume your distribution default is safe.

1. Audit Your Capabilities

First, check what your kernel actually supports regarding containers. If you are running Ubuntu 14.04 LTS (Trusty Tahr), you have decent defaults, but verify them.

# Check LXC configuration support
lxc-checkconfig

# You want to see "enabled" across the board, specifically:
# Cgroup: enabled
# Cgroup namespace: enabled
# User namespace: enabled

If "User namespace" is missing, you are running a high risk. User namespaces allow you to map the container's root user (UID 0) to a non-privileged user on the host. This is your first line of defense.

Network Isolation: The Forgotten Layer

Another pain point I see in local deployments is the default bridge networking. By default, containers can talk to each other. If one container gets compromised via a SQL injection, it can port scan its neighbors on the docker0 or lxcbr0 bridge.

You need to lock this down using iptables. Do not rely on the daemon to do this for you. Here is a standard lockdown script I use for my private nodes to ensure containers can only talk to the world, not each other.

# Flush existing rules (BE CAREFUL on production systems)
iptables -F
iptables -t nat -F

# strict default policies
iptables -P INPUT DROP
iptables -P FORWARD DROP

# Allow established connections
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT

# Allow loopback
iptables -A INPUT -i lo -j ACCEPT

# Allow SSH (adjust port as needed)
iptables -A INPUT -p tcp --dport 22 -j ACCEPT

# FORWARD chain: Allow containers to talk to the WAN (eth0) but NOT each other
# Assuming 172.17.42.0/24 is your bridge subnet
iptables -A FORWARD -i docker0 -s 172.17.42.0/24 -o eth0 -j ACCEPT
iptables -A FORWARD -m state --state ESTABLISHED,RELATED -j ACCEPT

# Log dropped packets so you can debug later
iptables -A INPUT -j LOG --log-prefix "[IPTABLES DROP] "
Pro Tip: Latency matters. Routing traffic through complex iptables chains adds overhead. At CoolVDS, our network architecture connects directly to NIX (Norwegian Internet Exchange) in Oslo. We handle the heavy DDoS protection at the edge, so your local iptables don't get flooded with garbage traffic, preserving your CPU for what matters—serving requests.

Storage Performance: The I/O Bottleneck

Containers are famous for being fast to start, but they can be terrible at disk I/O if you use the wrong storage driver. Device Mapper (dm) has been problematic in early Docker versions. AUFS is better but requires kernel patching on some distros.

However, the software layer means nothing if the physical disk is slow. In 2014, spinning rust (HDD) is dead for database hosting. You need random I/O performance.

We are seeing the emergence of PCIe-based flash storage and enterprise SSDs effectively saturating SATA III buses. While "NVMe storage" is a term you will hear more about in enterprise datasheets soon, the reality today is that you need high-performance SSDs to handle the concurrent read/write operations of multiple containers. If your host is suffering from I/O wait, your containers will hang, regardless of how much CPU you throw at them.

The "CoolVDS" Factor: KVM vs. Containers

This brings us to the architectural decision: Managed Hosting on bare metal containers vs. KVM.

Many budget providers in Europe resell OpenVZ containers as "VPS." This is dishonest. In OpenVZ (and LXC/Docker), you share the kernel with 50 other customers (