The Sharding Trap: Database Scaling Strategies for High-Throughput Systems
I have a rule that I tell every Junior Engineer who walks into my office with a whiteboard marker and a glint in their eye: Do not shard until it hurts.
There is a dangerous misconception in the dev community, fueled by Medium articles from Silicon Valley unicorns, that you need distributed database architecture the moment you hit 100GB of data. You don't. In fact, implementing sharding prematurely is the fastest way to turn a manageable monolith into a distributed nightmare of race conditions, broken joins, and latency spikes.
In May 2020, hardware is fast. Ridiculously fast. We aren't spinning rust anymore. But eventually, you hit the wall. Maybe it's write-lock contention, maybe it's the sheer size of the working set exceeding RAM. When that happens, you need a strategy, not a guess. Let's look at how to shard without destroying your data integrity, specifically within the context of high-performance infrastructure available here in Norway.
The Pain of Distributed Systems
Before we look at the code, understand the cost. Once you shard, you lose ACID compliance across nodes. JOIN operations become application-level logic. Uniqueness constraints are suddenly hard to enforce globally. If you are running an e-commerce platform targeting the Nordic market, latency is your enemy.
A query that used to take 2ms on a local socket might now take 15ms because it has to hop from your app server in Oslo to a database shard in a different rack (or worse, a different datacenter). If you are using standard VPS hosting with noisy neighbors, that 15ms tail latency will spike to 200ms during peak hours. This is why we stick to KVM virtualization at CoolVDS—kernel isolation matters when you are counting milliseconds.
Strategy 1: Application-Level Sharding (Consistent Hashing)
The most control you can get is handling the routing logic in your application code. The naive approach is shard_id = user_id % number_of_nodes. Do not do this. If you add a node, you have to migrate nearly all your data.
The battle-tested approach is Consistent Hashing. You map both your nodes and your data keys to a circle. When you add a node, you only remap a small fraction of keys.
Here is a Python 3.8 implementation of a simple Consistent Hash Ring. This logic would sit in your data access layer:
import hashlib
from bisect import bisect, bisect_left
class ConsistentHashRing:
def __init__(self, nodes=None, replicas=3):
self.replicas = replicas
self.ring = dict()
self._sorted_keys = []
if nodes:
for node in nodes:
self.add_node(node)
def _hash(self, key):
# MD5 is fast enough for distribution logic; we aren't using it for crypto here.
return int(hashlib.md5(key.encode('utf-8')).hexdigest(), 16)
def add_node(self, node):
for i in range(self.replicas):
key = self._hash(f'{node}:{i}')
self.ring[key] = node
self._sorted_keys.append(key)
self._sorted_keys.sort()
def get_node(self, string_key):
if not self.ring:
return None
key = self._hash(string_key)
idx = bisect(self._sorted_keys, key)
if idx == len(self._sorted_keys):
idx = 0
return self.ring[self._sorted_keys[idx]]
# Usage
cluster = ConsistentHashRing(nodes=['db-shard-01.coolvds.internal', 'db-shard-02.coolvds.internal'])
target_shard = cluster.get_node('user_email@example.com')
print(f"Write data to: {target_shard}")
This approach is flexible but requires your application to be smart. It also couples your codebase to your topology, which can be messy during deployments.
Strategy 2: The Middleware Layer (ProxySQL)
If you are running MySQL 8.0, the pragmatic choice is often middleware. Tools like ProxySQL allow you to transparently route queries without changing application code. You can split reads and writes, or shard based on schema.
I've deployed ProxySQL in front of high-traffic Magento clusters to handle split-brain scenarios. It allows you to define query rules. Here is how you configure sharding rules based on user IDs effectively. We assume you've already defined your hostgroups.
-- Route users with ID < 1000 to Hostgroup 10 (Shard A)
INSERT INTO mysql_query_rules (rule_id, active, username, match_pattern, destination_hostgroup, apply)
VALUES (1, 1, 'app_user', '^SELECT.*WHERE user_id < 1000', 10, 1);
-- Route users with ID >= 1000 to Hostgroup 20 (Shard B)
INSERT INTO mysql_query_rules (rule_id, active, username, match_pattern, destination_hostgroup, apply)
VALUES (2, 1, 'app_user', '^SELECT.*WHERE user_id >= 1000', 20, 1);
-- Load changes to runtime
LOAD MYSQL QUERY RULES TO RUNTIME;
SAVE MYSQL QUERY RULES TO DISK;
The risk here is the single point of failure. If ProxySQL goes down, you are dead in the water. You need a clustered setup with Keepalived, ideally sharing a floating IP. On our CoolVDS platform, we support private networking which makes this internal traffic free and isolated from the public internet.
The "Do Nothing" Strategy: Vertical Scaling
Before you implement the complexity above, look at your metrics. top and iostat don't lie. Most "scaling" problems are actually I/O bottlenecks.
In 2020, NVMe storage is a game-changer. A standard SATA SSD pushes maybe 500-600 MB/s. An NVMe drive can push 3,500 MB/s. If your database is I/O bound, simply moving to a CoolVDS instance with local NVMe storage can give you 5x the throughput without writing a single line of sharding code.
Check your MySQL configuration. If you are on high-speed storage, you must tell the database engine to use it. The default innodb_io_capacity is often set too low (200) for modern VPS environments.
[mysqld]
# Optimize for NVMe storage
innodb_io_capacity = 2000
innodb_io_capacity_max = 4000
innodb_flush_neighbors = 0 # NVMe handles random IO well, no need to group pages
# Ensure you are using enough RAM
innodb_buffer_pool_size = 12G # Assumes a 16GB VPS
innodb_log_file_size = 1G
Pro Tip: If you are hosting customer data for Norwegian clients, remember that GDPL and the Datatilsynet are watching. Complexity is the enemy of security. A single, well-secured, vertically scaled node is often easier to audit and secure than a 10-node sharded cluster scattered across availability zones.
Handling Globally Unique IDs
If you shard, you can no longer rely on AUTO_INCREMENT. You will get duplicate IDs on different shards. You have two options: UUIDs or Ticket Servers.
UUIDs (v4) are random and destroy your clustered index performance in MySQL because they cause page splitting. A better approach for 2020 is using Twitter Snowflake IDs or a custom ordered-UUID approach.
If you must use UUIDs in MySQL 8, swap the timestamp parts to make them sequential:
-- Using MySQL 8.0's binary-to-text functions for efficiency
CREATE TABLE users (
id BINARY(16) PRIMARY KEY,
name VARCHAR(255),
email VARCHAR(255)
);
INSERT INTO users (id, name, email)
VALUES (UUID_TO_BIN(UUID(), 1), 'Ola Nordmann', 'ola@norge.no');
-- Passing '1' to UUID_TO_BIN swaps the time-low and time-high parts
The Verdict
Sharding is powerful, but it's expensive in terms of engineering hours and maintenance. If you are pushing under 2TB of data, you probably just need better hardware and a tuned configuration.
Start with a high-performance replica set. Use read-replicas to offload `SELECT` queries. Optimize your indexes. And if your disk I/O is red-lining, migrate to infrastructure that doesn't choke on throughput.
At CoolVDS, we don't oversubscribe our CPU cores, and our storage arrays are strictly NVMe. It’s the closest you get to bare metal performance with the flexibility of a VPS. Test your heavy queries on a proper foundation before you complicate your architecture.