Console Login

Locking Down Linux Containers: A Survival Guide for the Paranoiac SysAdmin (May 2013 Edition)

Locking Down Linux Containers: A Survival Guide for the Paranoiac SysAdmin

Let’s be honest: virtualizing at the operating system level is a double-edged sword. You get near-native performance because you aren't emulating hardware, but you are sleeping in the same bed as every other tenant on that host. If you are running a standard OpenVZ or LXC setup in production today, you are trusting the kernel to keep users apart. And frankly, the Linux kernel is a complex beast with a wide attack surface.

I’ve seen it happen too many times. A developer decides to run a questionable binary inside a container, exploits a race condition in the kernel, and suddenly they are poking around the host node. If you are hosting sensitive data regulated by Datatilsynet here in Norway, that is not just a technical failure; it’s a legal nightmare waiting to happen under the Personal Data Act (Personopplysningsloven).

If you are deploying containers—whether you are experimenting with that new "Docker" project (v0.3 is interesting, but not production-ready) or sticking to battle-tested LXC/OpenVZ—you need to harden the hell out of them. Here is how we secure isolation at the metal level.

1. The Root of All Evil: UID Namespaces

The biggest threat in containerization right now is that root inside the container is equal to root on the host. If a process breaks out of the chroot jail, it has god-mode privileges on the physical server. This is unacceptable.

You must implement User Namespaces (available in recent kernels, though experimental in some distros). This maps UID 0 inside the container to an unprivileged user (like UID 100000) on the host. Even if an attacker breaks out, they find themselves as a nobody with no permissions.

If you are running LXC on Ubuntu 12.04 LTS or a newer kernel, configure your lxc.conf to map these IDs explicitly:

# /var/lib/lxc/my-container/config

# Map container root (0) to host user 100000
lxc.id_map = u 0 100000 65536
lxc.id_map = g 0 100000 65536

This single change renders 90% of privilege escalation exploits useless. If your hosting provider doesn't support custom kernel flags or user namespaces, move your data. It’s that simple.

2. Drop Capabilities Like They’re Hot

By default, Linux containers keep too many capabilities. Does your web server really need to load kernel modules (CAP_SYS_MODULE) or manipulate the system clock (CAP_SYS_TIME)? No. It just needs to serve PHP and talk to MySQL.

We need to follow the principle of least privilege. In your container configuration, explicitly drop everything you don't need. A compromised container with CAP_SYS_ADMIN is essentially a compromised host.

# Dropping dangerous capabilities in LXC
lxc.cap.drop = sys_module
lxc.cap.drop = sys_time
lxc.cap.drop = sys_admin
lxc.cap.drop = sys_boot
lxc.cap.drop = sys_rawio
Pro Tip: If you are using OpenVZ (common in cheap VPS Norway offers), check your /etc/vz/vz.conf. Many budget providers leave these wide open to reduce support tickets. At CoolVDS, we lock these down by default, or better yet, we encourage you to use KVM where you have your own kernel entirely.

3. Network Isolation with iptables

Bridging is great for connectivity, but dangerous for security. You don't want containers ARP-spoofing each other or sniffing traffic on the virtual bridge. While ebtables can help, good old iptables on the host is your primary defense line.

Ensure that traffic destined for the container creates a distinct path and cannot cross-talk to other interfaces unless explicitly allowed. Here is a snippet for the host node to ensure strict forwarding rules:

# /etc/sysconfig/iptables (CentOS 6)

*filter
:INPUT DROP [0:0]
:FORWARD DROP [0:0]
:OUTPUT ACCEPT [0:0]

# Allow loopback
-A INPUT -i lo -j ACCEPT

# Allow established connections
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT

# FORWARD chain: Only allow traffic to specific container IPs
-A FORWARD -i eth0 -o br0 -d 192.168.1.10 -p tcp --dport 80 -j ACCEPT
-A FORWARD -i br0 -o eth0 -s 192.168.1.10 -j ACCEPT

# Log and Drop everything else
-A INPUT -j LOG --log-prefix "DROPPED: "
COMMIT

4. Resource Control: Don't Let Neighbors Steal Your CPU

One of the classic "noisy neighbor" problems in shared hosting is CPU stealing. In 2013, with the explosion of heavy PHP frameworks and Java apps, a single container spinning a `while(1)` loop can degrade the performance of the whole node if Control Groups (cgroups) aren't tuned correctly.

You need to verify your disk I/O priorities. While the CFQ scheduler helps, you should demand strict I/O limits. We utilize the blkio controller in cgroups to guarantee throughput.

# Checking cgroup mounts
mount -t cgroup

# Example: throttling write speed for a specific container ID
echo "8:0 10485760" > /cgroup/blkio/lxc/container1/blkio.throttle.write_bps_device

This limits that specific container to 10MB/s write speed on device 8:0. It prevents a run-away log file from choking the disk controller for everyone else.

The Hardware Reality: Why Storage Speed Matters

Security isn't just about hackers; it's about availability. A DoS attack isn't always network-based; it can be I/O based. If your database is fighting for IOPS on a spinning Rust HDD, you are vulnerable.

This is why the industry is shifting toward PCIe-based flash storage and SSDs. At CoolVDS, we’ve moved away from mechanical SAS drives for our primary hosting tiers. Our High-Performance SSD instances (utilizing tech similar to the new NVMe standards emerging in enterprise hardware) provide the I/O headroom necessary to absorb spikes without crashing your MySQL service.

Feature Generic OpenVZ VPS CoolVDS KVM/SSD
Kernel Isolation Shared (Weak) Dedicated (Strong)
Disk I/O SATA/SAS (Slow, variable) PCIe SSD (Consistent, Fast)
Privacy (Datatilsynet) Data bleeds possible Strict Hardware Segregation
Latency to NIX Variable < 2ms

Conclusion: Paranoia is a Virtue

If you are managing infrastructure in Norway or across Europe, the old "fire and forget" mentality of VPS hosting is dead. You are responsible for the data integrity of your users. Containers are fantastic for density and quick deployments, but out-of-the-box configuration is a security liability.

If you need absolute isolation without sacrificing the speed of flash storage, you might want to stop fighting with lxc.conf and simply use a dedicated kernel environment.

Stop gambling with shared kernels. Deploy a fully isolated, PCIe-SSD powered KVM instance on CoolVDS today. Your uptime (and your sanity) will thank you.