Console Login

Cloud Storage Architecture: Why 2010's SAN Implementations Are Killing Your I/O in 2012

Cloud Storage Architecture: Why 2010's SAN Implementations Are Killing Your I/O in 2012

Let’s be honest with ourselves: the "Cloud" promise of 2010—infinite scalability on cheap commodity hardware—has hit a hard wall of reality. If you were managing infrastructure two years ago, you likely jumped on the SAN (Storage Area Network) bandwagon. It seemed logical. Centralize storage, detach compute, and scale. But here we are in late 2012, and if you are running a high-concurrency database on a shared magnetic SAN, you are likely waking up to alerts about iowait spiking through the roof.

I speak from the perspective of a CTO who has watched "cost-effective" architecture turn into a performance nightmare. The bottleneck is no longer CPU or RAM; it is disk I/O. In the Norwegian market specifically, where latency to the NIX (Norwegian Internet Exchange) in Oslo is measured in single-digit milliseconds, having your application stall for 200ms waiting for a spinning platter in a centralized storage array is unacceptable.

The Rotational Latency Trap

In 2010, SAS 15k RPM drives were the gold standard. Today, they are artifacts. The physics simply doesn't work for modern web applications like Magento or heavy Drupal installs. A 15k RPM drive pushes roughly 180-200 IOPS. Put ten of them in a RAID 10, and you have ~1000 random write IOPS. Now, put fifty virtual machines on that array. The math fails immediately.

I recently audited a client's setup running on a popular "cloud" provider using centralized storage. Their MySQL 5.5 master was choking. We ran diagnostics and found the disk queue length was consistently above 10.

Diagnosing the Bottleneck

If you suspect your storage solution is stuck in 2010, stop guessing and look at the kernel data. On your CentOS 6 or Ubuntu 12.04 LTS instances, use iostat to expose the truth.

# Install sysstat if you haven't already
yum install sysstat -y

# Check extended statistics every 1 second
iostat -x 1

You will see output resembling this:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4.50    0.00    2.10   45.30    0.00   48.10

Device:         rrqm/s   wrqm/s     r/s     w/s   svctm   %util
sda               0.00    15.00   45.00   80.00    7.80   98.50

Look at that %iowait (45.30%) and %util (98.50%). This server is doing almost nothing but waiting for the disk to spin. The CPU is idle, wasting money, because the storage subsystem cannot keep up. This is the hallmark of legacy SAN architectures that over-provision magnetic storage.

The Solution: Local SSD RAID 10

The industry is shifting, and we at CoolVDS have bet the farm on this shift. We abandoned the 2010 model of centralized SANs in favor of Local SSD RAID 10. By placing Solid State Drives directly inside the hypervisor node, we eliminate network latency and rotational delay entirely. We are seeing random read/write speeds exceeding 50,000 IOPS per node, compared to the paltry 1,000 of a legacy array.

However, simply buying SSD space isn't enough. You must configure your Linux kernel to treat the storage as non-rotational. If you are migrating your infrastructure to CoolVDS or any SSD-based platform, you must change your I/O scheduler.

Tuning Linux for SSDs

The default scheduler in CentOS 6 is usually CFQ (Completely Fair Queuing), which is optimized for spinning platters to minimize seek head movement. SSDs have no seek heads. CFQ actually slows them down. You need to switch to noop or deadline.

Check your current scheduler:

cat /sys/block/sda/queue/scheduler
[cfq] deadline noop

To change this instantly without a reboot:

echo noop > /sys/block/sda/queue/scheduler

To make it permanent, you need to edit your grub configuration. Open /boot/grub/menu.lst and append elevator=noop to your kernel line.

Database Optimization for Flash Storage

Once you have the hardware (CoolVDS SSD instances) and the OS scheduler fixed, you need to tell MySQL that it has room to breathe. The default innodb_io_capacity is often set to 200, assuming slow disks. On our infrastructure, you can crank this up safely.

Pro Tip: Don't just increase buffers blindly. If you are on SSDs, increasing the I/O capacity allows InnoDB to flush dirty pages faster, keeping your buffer pool clean for new data.

Edit your /etc/my.cnf:

[mysqld]
# Default is often 200, safe to increase on SSD
innodb_io_capacity = 2000
innodb_read_io_threads = 8
innodb_write_io_threads = 8

# Ensure you are using Percona or MySQL 5.5+
innodb_flush_neighbors = 0

Setting innodb_flush_neighbors = 0 is critical. On spinning disks, it made sense to flush adjacent pages to write them in a single sweep. On SSDs, random writes are fast, so this logic just adds CPU overhead. Turn it off.

Data Sovereignty: The Norwegian Context

Beyond raw IOPS, the other major shift from 2010 to 2012 is the legal landscape. The EU Data Protection Directive (95/46/EC) is being enforced more rigorously, and the Datatilsynet (Norwegian Data Protection Authority) is watching.

Many CTOs were tempted by cheap US-based cloud storage (Amazon S3, etc.). But with the concerns regarding the US Patriot Act, storing sensitive Norwegian customer data—like personal IDs or health records—on servers physically located in Virginia is a compliance risk many can no longer tolerate.

Feature US Mega-Cloud CoolVDS (Norway)
Latency to Oslo 80ms - 120ms < 5ms
Storage Backend Networked SAN (often varied) Local SSD RAID 10
Jurisdiction USA (Patriot Act) Norway (EU/EEA Privacy Laws)

For a Norwegian business, keeping data inside the country isn't just about nationalism; it's about latency and legal safety. When you host on CoolVDS, your data sits in Oslo. Your latency to local users is instantaneous, and you aren't subject to foreign subpoenas.

Moving Forward

The era of accepting "cloud wait times" is over. Technology in 2012 allows us to have the flexibility of virtualization with the raw speed of dedicated hardware, provided you choose the right architecture. Don't let your application die a slow death on legacy magnetic storage.

Audit your I/O wait today. If it's over 10%, it's time to move. Deploy a test instance on CoolVDS in 55 seconds and feel the difference of pure SSD performance.