When `UPDATE` Becomes a bottleneck: A Practical Guide to Sharding
Let’s be honest. We all love vertical scaling. It’s easy. You throw more RAM at the problem, bump the innodb_buffer_pool_size, switch from spinning rust to SSDs, and go home. But there comes a terrifying moment in every system architect's life when the hardware graph hits the top right corner and stays there.
With the recent invalidation of the Safe Harbor agreement by the ECJ just last month (October 2015), the pressure is on. Not only do we need performance, but we also need data sovereignty. You can't just casually dump your shards onto an AWS US-East bucket anymore without risking the wrath of Datatilsynet (The Norwegian Data Protection Authority). If you are running high-traffic applications in Norway, you need local infrastructure that screams performance.
Today, we aren't talking about rudimentary optimizations. We are talking about Sharding—the calculated act of splitting your database horizontally to survive massive write loads.
The "War Story": When the Master Dies
I recently consulted for a Norwegian e-commerce platform anticipating the Black Friday rush. They were running a standard Master-Slave MySQL 5.6 setup on a generic cloud provider. Their `iowait` was hovering at 40%. The application was sluggish. Why? Because while slaves help with reads, writes represent a single point of failure. Every order, every session update, every cart modification had to hit one single disk array.
We moved them to a sharded architecture on KVM-based VPS instances. The result? Write latency dropped from 200ms to 8ms.
Sharding Strategies for 2015
Sharding isn't a feature you toggle in `my.cnf`. It's an architectural decision implemented at the application level (unless you are brave enough to try MySQL Fabric, which frankly, is still a bit rough around the edges).
1. Key-Based (Hash) Sharding
This is the most common approach. You take a distinct key (like `user_id`), hash it, and use the modulo operator to determine which server holds the data.
$numberOfShards = 4;
$shardId = $userId % $numberOfShards;
$connectionString = "db_shard_" . $shardId;Pros: Even distribution of data.
Cons: Resharding is a nightmare. If you go from 4 to 5 servers, nearly all data must move. Consistent Hashing helps, but it adds complexity.
2. Directory-Based Sharding
You maintain a lookup table (a "Directory Service") that maps IDs to specific database hosts. Before a query, your app asks the directory: "Where does User 452 live?"
Pros: Extremely flexible. You can move individual users to different hardware without re-hashing everyone.
Cons: The lookup table becomes your new single point of failure. It must be cached heavily (Memcached or Redis) to avoid latency penalties.
Hardware Matters: The I/O Reality
Software architecture cannot fix bad hardware. In 2015, if you are running a database on standard HDDs (SAS/SATA), you are already behind. Sharding on slow disks just gives you multiple slow databases.
This is where the underlying virtualization technology becomes critical. Container-based hosting (like OpenVZ) often suffers from "noisy neighbors" where another customer's bad query eats your I/O.
| Storage Type | Random Read IOPS | Write Latency | Suitability |
|---|---|---|---|
| 7.2k SATA HDD | ~80 - 100 | High | Backups / Archives |
| 15k SAS HDD | ~180 - 200 | Medium | Legacy ERP |
| Enterprise SSD (CoolVDS Standard) | 10,000+ | < 1ms | Transactional DBs |
Pro Tip: Always align your filesystem to the storage block size. On our CoolVDS Linux templates, we recommend using XFS for database partitions over EXT4 for better concurrency handling under heavy threaded loads.
Configuring MySQL 5.7 for Shards
MySQL 5.7 GA was released recently (October 2015), and it brings significant improvements. However, most of you are likely still on 5.6. Regardless of the version, each shard acts as an independent master. Here is a baseline configuration for a 16GB RAM instance dedicated to a single shard:
[mysqld]
# 70-80% of RAM for InnoDB
innodb_buffer_pool_size = 12G
# Essential for data integrity (ACID)
innodb_flush_log_at_trx_commit = 1
# Per-thread buffers (don't set too high or you'll OOM)
sort_buffer_size = 2M
read_buffer_size = 2M
# Connection handling
max_connections = 500
thread_cache_size = 50
# Binary logging (Critical for backups/replication)
log_bin = /var/log/mysql/mysql-bin.log
expire_logs_days = 7
server_id = 101 # Unique per shardThe Local Advantage
Latency is determined by physics. The speed of light is constant. If your users are in Oslo, Bergen, or Trondheim, and your database shard is in Frankfurt, you are adding 20-30ms of round-trip time (RTT) to every query. In a sharded environment, where an application might need to query two shards to build a view, that latency compounds.
Hosting within Norway isn't just about the Personopplysningsloven (Personal Data Act); it's about the user experience. Connecting to the Norwegian Internet Exchange (NIX) ensures your packets take the shortest path.
Conclusion
Sharding is complex. It breaks joins, complicates transactions, and makes backups harder. Do not shard prematurely. First, optimize your queries and upgrade your hardware. But when you simply cannot write data fast enough, sharding is the answer.
When you are ready to split your monolith, you need a foundation that won't buckle. At CoolVDS, we provide KVM-virtualized instances with dedicated resources and high-performance SSD storage located right here in Norway. We handle the infrastructure so you can handle the architecture.
Ready to test your sharding logic? Deploy a high-performance SSD VPS in 55 seconds.