Console Login

The I/O Bottleneck: Why Your "Cloud" Strategy Might Fail in 2010

The I/O Bottleneck: Why Your "Cloud" Strategy Might Fail in 2010

It is April 2010. If I hear one more sales representative pitch me a "Cloud Solution" that is essentially just a rebranded shared hosting account with a prettier control panel, I might disconnect my BlackBerry service. The buzz around Amazon EC2 and the emerging Rackspace Cloud is deafening, but as a systems architect responsible for infrastructure in Norway, I look at one metric that marketing brochures conveniently ignore: Disk I/O Latency.

We are seeing a shift. Enterprises are moving from physical racks to virtualized environments. But virtualization introduces overhead. If you are running a high-concurrency MySQL database on a virtual machine fighting for disk access with twenty other "noisy neighbors," your response times will suffer. Here is how we navigate the storage minefield this year.

The Spinner Problem: SATA vs. SAS vs. The SSD Dream

Most budget VPS providers are stacking cheap 7200 RPM SATA drives in RAID 5 arrays. This is a recipe for disaster for any write-heavy application. In a RAID 5 setup, every write operation requires a parity calculation. When you have 50 virtual machines hitting that array simultaneously, the physics of spinning platters takes over. The read/write head can only move so fast.

The Reality Check: If you are running `top` or `vmstat 1` and your `wa` (IO wait) percentage is consistently over 20%, your CPU is just sitting idle, waiting for the disk to wake up. It is wasted money.

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 1  4      0 245000  45000 120000    0    0   450   800  950 1200 15  5 40 40

Above: A typical bottleneck. 40% wait time means your application is dead in the water, regardless of how much RAM you allocated.

At CoolVDS, we have abandoned RAID 5 entirely for our production nodes. We utilize RAID 10 with 15k RPM SAS drives. While Solid State Drives (SSDs) like the Intel X25-M are entering the enterprise space, they are still cost-prohibitive for mass storage. However, a properly tuned RAID 10 SAS array offers the best balance of redundancy and write performance available today.

Filesystem Tuning: Squeezing Performance out of EXT3/EXT4

While we wait for SSDs to become standard (maybe in a few years), you need to optimize what you have. If you are running CentOS 5.4 or the new Ubuntu 10.04 LTS (Lucid Lynx), you are likely using EXT3 or the newer EXT4.

A simple, often overlooked tweak is the `noatime` mount option. By default, Linux writes to the disk every time a file is read just to update the access time. For a web server serving thousands of static images, this creates a massive amount of unnecessary write operations.

Edit your `/etc/fstab`:

/dev/sda1   /   ext3   defaults,noatime,nodiratime   1 1
Architect's Note: Switching from EXT3 to EXT4 can also yield significant performance gains due to extents and delayed allocation, but ensure your kernel is up to date. Stability is paramount over bleeding-edge speed in production.

Data Sovereignty: The Norwegian Context

Latency isn't just about physics; it is about geography. Hosting your database in Virginia (US-East) when your customers are in Oslo is illogical. The round-trip time (RTT) alone will add 100ms to every handshake. Multiply that by the dozens of queries required to render a Magento storefront, and your site feels sluggish.

More critically, we must address the Personal Data Act (Personopplysningsloven) of 2000. The Data Inspectorate (Datatilsynet) is increasingly vigilant about how Norwegian citizen data is handled. While the US Safe Harbor framework exists, many Norwegian CTOs are uncomfortable with the legal ambiguity of storing sensitive customer data on US soil.

Hosting within Norway, specifically in data centers connected directly to the NIX (Norwegian Internet Exchange), solves two problems:

  1. Compliance: You are fully protected under Norwegian law, avoiding cross-border data transfer headaches.
  2. Speed: Latency from Oslo to our datacenter is typically under 2ms.

The Virtualization Layer: Xen vs. OpenVZ

Not all VPS instances are created equal. In 2010, the market is flooded with OpenVZ containers. OpenVZ relies on a shared kernel. It allows providers to massive oversell resources. If one user on the node gets DDoS'd, everyone suffers.

This is why CoolVDS standardizes on KVM (Kernel-based Virtual Machine) and Xen. These provide true hardware virtualization. Your RAM is your RAM. Your swap is your swap. It prevents the "noisy neighbor" effect from crashing your application during peak hours. If you are serious about stability, shared kernel virtualization is not an option for your primary database.

Final Recommendation

The cloud is a powerful concept, but hardware reality still dictates performance. Don't let your application die in the `iowait` queue. Choose a provider that guarantees RAID 10 performance and keeps your data within Norwegian jurisdiction.

If you need to verify your current bottlenecks, run `iostat -x 2` on your server today. If the numbers scare you, it might be time to test a CoolVDS instance.