Taming the Beast: Multi-Host Docker Networking with Open vSwitch and GRE
Let's be honest: Docker is moving faster than the rest of the ecosystem can keep up. Version 0.9 dropped just last week with the new execution driver API, finally letting us move away from the hard LXC dependency. That's great for portability, but if you are trying to run a serious distributed application across multiple servers, the networking stack is still a disaster zone.
I see developers pushing containers to production every day, assuming that --link is a magic bullet. It works fine on a laptop. But the moment you need to scale that MySQL backend to a secondary node for high availability, or split your Nginx frontend from your PHP-FPM workers across two different physical hosts, the abstraction leaks. You are left staring at iptables rules and dealing with NAT overhead that eats your CPU cycles.
If you are running mission-critical workloads in Norway, you can't afford that latency. I recently debugged a setup for a client in Oslo where their API response times doubled simply because of inefficient port mapping and packet routing between two VPS instances hosted on a legacy provider. The solution isn't waiting for the "next big orchestration tool"—it's getting your hands dirty with Open vSwitch (OVS) and GRE tunnels right now.
The Problem: Docker's "Bridge to Nowhere"
By default, Docker creates a docker0 bridge. It assigns a subnet (usually 172.17.0.0/16) and NATs traffic out to the world. This is isolation, not networking. If Host A wants to talk to a container on Host B, it has to hit Host B's public IP, get routed through the firewall, NATed to the internal IP, and finally delivered. It is messy, insecure, and hell to debug.
We need a flat network. We want Container A on Host 1 (10.0.0.2) to ping Container B on Host 2 (10.0.0.3) directly, without caring about the underlying hardware. This is where Open vSwitch comes in.
The Architecture: OVS + GRE Tunnels
We are going to bypass the default Docker bridge and use OVS to create a virtual switch that spans across our CoolVDS instances using a GRE (Generic Routing Encapsulation) tunnel. This encapsulates the internal traffic and sends it over the public network transparently.
Pro Tip: This setup requires kernel module support foropenvswitchandgre. Most budget VPS providers running OpenVZ will fail here because you share a kernel with the host. You need a proper KVM environment like CoolVDS to load your own kernel modules. Don't waste time trying this on shared containers.
Step 1: Preparing the Hosts
Assume we have two CoolVDS KVM instances:
- Node A: 192.168.1.10 (Public IP)
- Node B: 192.168.1.11 (Public IP)
First, install Open vSwitch on both nodes (assuming Ubuntu 12.04 LTS or 13.10):
sudo apt-get update
sudo apt-get install openvswitch-switch
Verify the kernel module is loaded:
lsmod | grep openvswitch
If that returns nothing, run modprobe openvswitch. If you get a "Permission Denied" error, call your hosting provider and ask for a refund. On CoolVDS, this works out of the box because we give you full hardware virtualization.
Step 2: Creating the OVS Bridge
On both nodes, we create a bridge named br0.
sudo ovs-vsctl add-br br0
Now, we need to configure the internal IP addresses for this bridge. This will be the gateway for your containers.
On Node A:
sudo ifconfig br0 10.0.0.1 netmask 255.255.255.0 up
On Node B:
sudo ifconfig br0 10.0.0.2 netmask 255.255.255.0 up
Step 3: establishing the GRE Tunnel
Here is where the magic happens. We connect the two virtual switches over the physical network.
On Node A (connect to B):
sudo ovs-vsctl add-port br0 gre0 -- set interface gre0 type=gre options:remote_ip=192.168.1.11
On Node B (connect to A):
sudo ovs-vsctl add-port br0 gre0 -- set interface gre0 type=gre options:remote_ip=192.168.1.10
At this point, you should be able to ping 10.0.0.2 from Node A. If you see high latency, check the physical path. Since CoolVDS servers are located in premium data centers with direct peering to NIX (Norwegian Internet Exchange), the overhead of GRE encapsulation is negligible—usually under 2ms.
Step 4: Connecting Docker Containers
Docker doesn't natively speak OVS yet. We have to use a wrapper script or manually create veth pairs. The most reliable tool currently is Jérôme Petazzoni's pipework. It is a shell script that hacks the network namespace of a running container.
Install pipework:
sudo wget https://raw.github.com/jpetazzo/pipework/master/pipework -O /usr/local/bin/pipework
sudo chmod +x /usr/local/bin/pipework
Now, let's launch a container without networking and attach it to our OVS bridge.
On Node A:
# Start a container without network
DOCKER_ID=$(sudo docker run -d -i -t --net=none ubuntu /bin/bash)
# Assign IP 10.0.0.10 to the container via br0
sudo pipework br0 $DOCKER_ID 10.0.0.10/24
On Node B:
DOCKER_ID=$(sudo docker run -d -i -t --net=none ubuntu /bin/bash)
sudo pipework br0 $DOCKER_ID 10.0.0.11/24
Now, enter the container on Node A and ping Node B:
sudo docker attach $DOCKER_ID
root@container:/# ping 10.0.0.11
PING 10.0.0.11 (10.0.0.11) 56(84) bytes of data.
64 bytes from 10.0.0.11: icmp_req=1 ttl=64 time=0.45 ms
0.45ms. That is the power of a flat network.
Security Implications and Norwegian Data Law
Running a flat network over GRE means your traffic is encapsulated, but standard GRE is not encrypted. If you are transmitting sensitive personal data (Personopplysningsloven), you must run this GRE tunnel over IPsec or use a VPN layer below it.
This is another reason why hardware choice matters. Encryption is CPU intensive. If you are on a noisy neighbor VPS where someone else is mining Bitcoin or compiling kernels, your I/O and CPU verify operations will choke, and your network throughput will tank.
For compliant setups, we recommend configuring iptables on the bridge interface to strictly limit which containers can talk to each other:
# Only allow traffic on port 80 between containers
iptables -A FORWARD -i br0 -p tcp --dport 80 -j ACCEPT
iptables -A FORWARD -i br0 -j DROP
Why Infrastructure Choice is the Bottleneck
You can script OVS and Docker all day, but if the underlying disk I/O is garbage, your database containers will time out regardless of how fast your network is. In 2014, we are seeing a shift from spinning rust to SSDs, but not all SSDs are created equal.
We built CoolVDS on KVM because we believe sysadmins need to own the kernel. We use pure SSD RAID arrays because waiting on I/O wait states is losing money. When you are building a custom network topology like the one above, you need the stability of dedicated resources.
Don't let your infrastructure dictate your architecture. If you need a GRE tunnel, build it. If you need custom kernel modules, load them.
Ready to build a real cluster? Spin up a KVM instance on CoolVDS today and get full root access in under 55 seconds.