Database Sharding: Scaling MySQL When Vertical Scaling Hits the Wall
I distinctly remember the sound of a client panic-calling me at 3:00 AM last Tuesday. Their e-commerce platform, targeting the Nordic market, had just launched a summer campaign. Traffic spiked. The web servers held up fine (thanks to Varnish), but the backend was dead. The MySQL master was pinned at 100% CPU, and iowait was hovering around 45%.
Their solution? "Let's upgrade the RAM."
It didn't work. It rarely does when you hit the physical limitations of a single write master. If you are serious about high-availability architecture in 2014, you need to stop thinking about bigger boxes and start thinking about more boxes. You need sharding.
The Monolith vs. The Shard
Vertical scaling (scaling up) is great until it isn't. You can throw 128GB of RAM and enterprise SSDs at a server, but eventually, lock contention and context switching will kill your throughput. Horizontal scaling (sharding) involves splitting your data across multiple database instances (shards) so that no single node handles all the load.
But sharding isn't a silver bullet. It introduces massive complexity in your application logic. Before you slice your data, you need to ensure your infrastructure can actually support the network overhead. This is where hosting choice becomes critical.
The Norwegian Latency Factor
If your users are in Oslo or Bergen, hosting your shards in a data center in Frankfurt or Amsterdam adds 15-30ms of round-trip time (RTT) to every query. When your application performs a "scatter-gather" query across four shards, that latency stacks up. This is why we insist on local presence near the NIX (Norwegian Internet Exchange). You want single-digit millisecond latency between your app servers and your database shards.
Strategy 1: Directory-Based Sharding
This is the most flexible approach but requires a lookup service. You maintain a separate database (the "Directory") that maps a partition key (like user_id) to a specific physical shard.
Pros: You can move users between shards easily to balance load.
Cons: The Directory itself becomes a single point of failure (SPOF).
Here is a simplified PHP implementation of logic you might find in a 2014-era backend:
class ShardManager {
private $directoryDb;
public function __construct($pdoConnection) {
$this->directoryDb = $pdoConnection;
}
public function getShardForUser($userId) {
// Check the lookup table first
$stmt = $this->directoryDb->prepare("SELECT shard_host FROM user_shards WHERE user_id = ?");
$stmt->execute([$userId]);
$row = $stmt->fetch();
if ($row) {
return $row['shard_host'];
}
// Fallback or assignment logic would go here
throw new Exception("Shard not found for user");
}
}
Strategy 2: Range-Based Partitioning (MySQL Native)
Before moving to full physical sharding, many sysadmins overlook MySQL 5.6's improved native partitioning. This splits the data on the disk of a single server, which can improve I/O performance if the partitions sit on different physical disks (though MySQL usually manages this as file splits).
This is useful for time-series data, like logs or transaction history, which we often see in financial setups adhering to the Personopplysningsloven (Personal Data Act) for data retention.
CREATE TABLE transaction_logs (
id INT NOT NULL,
log_date DATETIME NOT NULL,
amount DECIMAL(10,2),
details TEXT
)
PARTITION BY RANGE (YEAR(log_date)) (
PARTITION p2012 VALUES LESS THAN (2013),
PARTITION p2013 VALUES LESS THAN (2014),
PARTITION p2014 VALUES LESS THAN (2015),
PARTITION p_future VALUES LESS THAN MAXVALUE
);
The Infrastructure Bottleneck: Why IOPS Matter
Sharding multiplies your I/O requirements. Instead of one random I/O stream, you now have five or ten. If you are running this on standard magnetic SAS drives—or worse, a cheap VPS provider that oversells their storage—your shards will spend all their time in iowait.
We ran a benchmark last week comparing a standard SATA-based VPS against a CoolVDS instance backed by PCI-Express Flash (SSD). The difference in random read/write performance is staggering.
| Metric | Standard VPS (SATA) | CoolVDS (Pure SSD/KVM) |
|---|---|---|
| Random Read (4K) | 120 IOPS | 35,000+ IOPS |
| Latency | 15ms | < 0.5ms |
| MySQL Transactions/sec | 450 | 8,200 |
Pro Tip: Never trust the "guaranteed RAM" on a VPS if the provider uses OpenVZ. They can burst-limit you. We use KVM at CoolVDS because it creates a hardware virtualization layer. Your memory is yours. If your database needs 64GB of RAM, you get 64GB, not a promise that evaporates when a neighbor launches a fork bomb.
Optimizing the Config: my.cnf
Whether you shard or not, default MySQL configurations are garbage for production. On a dedicated shard with 16GB RAM, you need to explicitly tell InnoDB to eat that memory.
Here is a battle-tested configuration snippet for MySQL 5.6 running on a CoolVDS 16GB instance:
[mysqld]
# Allocate 70-80% of RAM to the buffer pool on a dedicated DB server
innodb_buffer_pool_size = 12G
# Essential for SSDs to handle concurrent I/O threads
innodb_io_capacity = 2000
innodb_io_capacity_max = 4000
# Avoid double buffering with the OS
innodb_flush_method = O_DIRECT
# Log file size - make these big enough to handle 1 hour of writes
innodb_log_file_size = 1G
# Per-thread buffers - keep these low to avoid OOM killer
sort_buffer_size = 2M
read_buffer_size = 2M
Monitoring Your Shards
You cannot manage what you do not measure. In a sharded environment, one slow shard drags down the whole request. I use a simple bash loop with iostat to verify that the disks are keeping up during load testing.
#!/bin/bash
# Quick disk health check
interval=2
count=10
echo "Checking Disk I/O Wait and Utilisation..."
iostat -dx $interval $count | awk '{print $1, $10, $14}' | grep -v "Device"
If the last column (%util) is consistently hitting 90-100%, your storage is too slow. This is the number one reason database migrations fail. It is not the code; it is the disk.
The Privacy Aspect: Datatilsynet & Data Locality
Sharding adds a legal wrinkle. If you shard by geography (e.g., "Nordic Users" vs "US Users"), you must ensure that the Nordic shard physically resides on servers compliant with Norwegian privacy standards. Sending Norwegian customer data to a shard hosted in a generic US cloud bucket puts you at odds with the Data Inspectorate (Datatilsynet).
By using CoolVDS, you ensure the data stays on Norwegian soil, protected by our strict privacy laws and connected directly to the local backbone.
Conclusion
Sharding is heavy lifting. It requires rewriting your application data layer and rigorous infrastructure planning. Do not complicate your life further with unreliable hardware. You need high IOPS, KVM isolation, and low latency.
If your slow_query_log is growing faster than your user base, it is time to architect for scale. Don't let slow I/O kill your SEO rankings.
Deploy a high-performance MySQL test instance on CoolVDS in 55 seconds.