Console Login

The "Cloud" Lie: Why Shared Storage Kills Performance (And How to Fix It)

The Great Storage Bottleneck of 2011

Let’s be honest with ourselves. The word "Cloud" has become the most dangerous marketing term of 2010 and 2011. Managers love it because it promises infinite scalability and pay-as-you-go pricing. But those of us actually managing the systems—the sysadmins staring at top at 3:00 AM—know the dirty truth: Disk I/O is where cloud servers go to die.

I recently consulted for a Norwegian media outlet trying to scale their Drupal installation on a major US-based "cloud" provider. They had CPU cycles to spare, yet the site was crawling. The culprit? I/O Wait. Their fancy virtual machine was fighting for disk access with five hundred other neighbors on a congested SAN (Storage Area Network).

When you are building critical infrastructure, specifically here in Norway where we value stability and quality, you cannot rely on the "noisy neighbor" lottery. You need guaranteed throughput.

Diagnosing the I/O choke

Before we talk about solutions, you need to prove the problem. If your load average is high but your CPU usage is low, your server is screaming for data it can't read fast enough. The tool of choice here is iostat (part of the sysstat package on CentOS 5/6 and Debian Squeeze).

Here is what a healthy database server should look like under load:

$ iostat -x 1
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           15.4    0.0     4.2     1.1     0.0    79.3

Device:         rrqm/s   wrqm/s     r/s     w/s   svctm   %util
sda               0.00    12.00   24.00   45.00    2.50   12.50

If that %iowait column creeps above 20-30%, or your svctm (service time) spikes while %util hits 100%, your storage backend is failing you. On many budget VPS providers, I regularly see %iowait hitting 80% just because another customer on the same physical host decided to run a backup.

The Kernel Scheduler: A Quick Fix?

Before you migrate, you can try to mitigate this by tuning the Linux I/O scheduler. By default, most distributions (RHEL 6, Ubuntu 10.04) default to CFQ (Completely Fair Queuing). CFQ tries to be fair to all processes, which is great for a desktop but terrible for a database server needing instant access.

For virtualized environments (Xen or KVM), the deadline or noop schedulers are often superior. noop is particularly effective because the hypervisor handles the reordering, so the guest OS shouldn't waste cycles doing it twice.

You can change this on the fly to test:

# Check current scheduler
cat /sys/block/sda/queue/scheduler
[cfq] deadline noop

# Switch to deadline
echo deadline > /sys/block/sda/queue/scheduler

To make it permanent in Grub (legacy), append elevator=deadline to your kernel line in /boot/grub/menu.lst.

The Hardware Reality: SAS 15k vs. SSD

Spinning rust is reaching its physical limits. A standard 15k RPM SAS drive gives you maybe 180-200 IOPS (Input/Output Operations Per Second). If you put 20 VPS customers on a RAID array of 4 disks, you have roughly 800 IOPS to share. One heavy MySQL query can consume 500 IOPS instantly.

This is where the industry is shifting. While the NVMe specification was just released earlier this year (March 2011), it will take years to hit commodity hardware. However, Enterprise SSDs and PCIe flash storage are available now. They offer a quantum leap in performance—thousands of IOPS instead of hundreds.

Pro Tip: Never run a high-traffic database on a filesystem mounted with `atime` (access time) enabled. Every time you read a file, the kernel writes a timestamp, turning reads into writes.

Optimize your `/etc/fstab` immediately:

# /etc/fstab optimization
/dev/sda1   /       ext4    defaults,noatime,nodiratime,barrier=0   1 1

Note: Only disable barriers (`barrier=0`) if you have a battery-backed write cache (BBWC) on your RAID controller, otherwise you risk data corruption during power loss.

Data Sovereignty in Norway

Beyond raw speed, we have to talk about jurisdiction. With the current implementation of the EU Data Protection Directive (95/46/EC) and Norway’s specific Personopplysningsloven (Personal Data Act), knowing where your physical bits live is mandatory.

If you host customer data on a US cloud, you are entering a legal grey area regarding safe harbor. Hosting in Oslo isn't just about ping times (though 2ms latency to NIX is fantastic); it's about compliance with Datatilsynet requirements.

MySQL Tuning for Storage Latency

If you are stuck on slower storage, you must optimize how MySQL flushes to disk. In `my.cnf`, the `innodb_flush_log_at_trx_commit` setting is the most critical lever for write performance.

  • Value 1 (Default): Flushes to disk after every transaction. Safest, but slowest.
  • Value 2: Flushes to OS cache every transaction, syncs to disk once per second. Good compromise.
  • Value 0: Flushes once per second. Fastest, but you lose 1 second of data if the server crashes.
[mysqld]
# Optimize for performance over absolute ACID compliance if I/O is the bottleneck
innodb_flush_log_at_trx_commit = 2
innodb_buffer_pool_size = 2G  # Set to 70-80% of available RAM
innodb_file_per_table = 1

The CoolVDS Approach: Local Storage, No Nonsense

We built CoolVDS because we were tired of the "Cloud" ambiguity. We don't use massive, congested SANs that add milliseconds of latency. We use local RAID-10 arrays with high-performance caching.

By keeping storage local to the hypervisor, we eliminate the network hop required by traditional cloud storage. Whether you are running a high-traffic Magento store or a custom Java application, the physics are simple: closer data means faster delivery.

Furthermore, we are early adopters of Solid State technology. While others are still spinning 7.2k SATA drives for their VPS nodes, we are integrating high-speed storage tiers that mimic the promise of the upcoming NVMe standards.

Why this matters for your TCO

A slow server wastes your time. Waiting for `apt-get upgrade` to unpack because the disk is busy wastes money.

Feature Typical Cloud VPS CoolVDS Norway
Storage Backend Networked SAN (Shared) Local RAID-10 (Dedicated Feel)
IOPS Consistency Fluctuates wildly Stable & High
Latency to Oslo 20-40ms (if in Frankfurt/London) < 5ms
Virtualization Often Container-based (Oversold) KVM (Kernel-based Virtual Machine)

Conclusion

In 2011, CPU is cheap. RAM is affordable. Storage I/O is the premium resource. Don't let a slow disk ruin your application's responsiveness. If you need low latency, data sovereignty under Norwegian law, and raw disk throughput, you need to move away from generic clouds and onto dedicated-resource virtualization.

Stop waiting on I/O. Deploy a high-performance KVM instance in our Oslo datacenter today.