Console Login

Scaling Beyond the Monolith: Practical Database Sharding Strategies for High-Throughput Systems

Scaling Beyond the Monolith: Practical Database Sharding Strategies

It starts with a slow query log. Then, CPU IO_WAIT creeps up during backups. Finally, you hit the wall: your primary database server, despite having 128GB of RAM and 32 vCPUs, locks up during peak traffic. I’ve been there. In 2019, I watched a major e-commerce platform in Oslo grind to a halt because we relied on a single master node for writes. We tried optimizing indexes, we tried caching aggressively with Redis, but physics is physics. Eventually, you run out of vertical runway.

This is where sharding comes in. It is not a silver bullet. In fact, it is an architectural nightmare if implemented poorly. But for systems handling terabytes of data with high write throughput, it is the only path forward. In this guide, we will dismantle the complexity of horizontal partitioning (sharding) using standard tools available in 2021, specifically focusing on MySQL 8.0 and ProxySQL.

The "Vertical" Trap and When to Break It

Before you shard, you must exhaust vertical scaling. Upgrading to a massive instance on CoolVDS with dedicated NVMe storage is almost always cheaper and simpler than rewriting your application for sharding. Sharding introduces network latency, consistency challenges, and operational overhead.

You are ready to shard only when:

  • Your active working set exceeds maximum available RAM.
  • Write throughput saturates the disk I/O of a single NVMe drive.
  • Schema changes (ALTER TABLE) take hours or days to complete.

Sharding Architectures: Key-Based vs. Directory-Based

The two most robust strategies for 2021 are Key-Based (Algorithmic) and Directory-Based (Lookup) sharding.

1. Key-Based Sharding (Algorithmic)

This uses a hash of a sharding key (like user_id or customer_id) to determine which server holds the data. It is fast because no lookup query is needed.

shard_id = hash(customer_id) % total_shards

The Pros: Extremely low latency. Ideal for high-speed VPS environments.
The Cons: Resharding (adding new servers) is painful because it changes the modulo result, requiring massive data migration.

2. Directory-Based Sharding

You maintain a lookup table that maps an ID to a specific physical shard. This allows you to move specific "heavy" tenants to their own dedicated hardware without moving everyone else.

Implementation: The Tech Stack

We will use a standard stack proven in production environments across Europe:

  • Database: MySQL 8.0 (InnoDB engine)
  • Routing Layer: ProxySQL 2.x
  • Infrastructure: CoolVDS High-Frequency Compute Instances

Step 1: ID Generation

Auto-increment integers fail in distributed systems. You need globally unique IDs. While UUIDs are common, they fragment InnoDB clustered indexes due to their randomness, killing performance. In 2021, the best practice is using a Ticket Server (Flickr strategy) or Twitter Snowflake ID generation.

Here is a simplified table structure for a ticket server:

CREATE TABLE `Tickets64` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `stub` char(1) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  UNIQUE KEY `stub` (`stub`)
) ENGINE=InnoDB;

You replace INSERT with:

REPLACE INTO Tickets64 (stub) VALUES ('a');
SELECT LAST_INSERT_ID();

Step 2: Configuring the Shards (MySQL)

Performance tuning is critical. On a CoolVDS instance with NVMe, you must ensure MySQL is not throttling itself. Standard my.cnf defaults are garbage for high load.

[mysqld]
# NVMe Optimization
innodb_io_capacity = 2000
innodb_io_capacity_max = 4000
innodb_flush_neighbors = 0

# Memory - assume 16GB RAM instance
innodb_buffer_pool_size = 12G
innodb_log_file_size = 2G

# Connection Handling
max_connections = 1000
thread_cache_size = 50
Pro Tip: Set innodb_flush_neighbors = 0. This setting was designed for spinning HDDs (rotational storage). On the NVMe drives we use at CoolVDS, enabled flush neighbors actually reduces throughput by increasing CPU overhead. Turn it off.

Step 3: Routing with ProxySQL

Instead of hardcoding shard IPs in your application code (which is messy), use ProxySQL. It sits between your app and the database, routing queries based on regex rules.

First, define your hostgroups in ProxySQL admin interface:

-- Shard 1 (Hostgroup 10)
INSERT INTO mysql_servers (hostgroup_id, hostname, port) VALUES (10, '10.0.0.5', 3306);
-- Shard 2 (Hostgroup 20)
INSERT INTO mysql_servers (hostgroup_id, hostname, port) VALUES (20, '10.0.0.6', 3306);

LOAD MYSQL SERVERS TO RUNTIME;
SAVE MYSQL SERVERS TO DISK;

Then, set up sharding rules. If you are sharding by user_id and your application appends a comment like /* shard=X */ to queries, routing becomes trivial:

INSERT INTO mysql_query_rules (rule_id, active, match_pattern, destination_hostgroup, apply)
VALUES (1, 1, "shard=1", 10, 1);

INSERT INTO mysql_query_rules (rule_id, active, match_pattern, destination_hostgroup, apply)
VALUES (2, 1, "shard=2", 20, 1);

The Infrastructure Factor: Why Latency Kills Shards

In a monolithic architecture, a join is a memory operation. In a sharded architecture, a join (if you fail to design for data locality) becomes a network operation. If your shards are hosted on overloaded shared hosting with "noisy neighbors," your query times will spike unpredictably.

This is why CoolVDS focuses on KVM virtualization with dedicated resource allocation. For a database cluster:

  1. Private Networking: You need a backend network for replication and ProxySQL traffic that is separate from the public internet. Our private LAN offers unmetered gigabit speeds between instances.
  2. Geography Matters: If your customers are in Norway, hosting your shards in Frankfurt adds ~20-30ms round trip. Hosting in Oslo keeps it under 2ms. For a query fetching data from 3 shards, that difference is the user perceiving the site as "instant" vs "sluggish."

Compliance: The Schrems II Reality

We are writing this in mid-2021, and the fallout from the Schrems II ruling is still settling. The CJEU declared the Privacy Shield invalid. This means transferring personal data of EU/EEA citizens to US-controlled cloud providers is legally perilous. Datatilsynet (The Norwegian Data Protection Authority) has been clear about the risks.

Sharding data across international borders complicates this tenfold. By keeping your entire sharded cluster within Norwegian jurisdiction on CoolVDS, you simplify your GDPR compliance posture significantly. You know exactly where the physical drives are.

Final Thoughts: Don't optimize prematurely

If your database is under 500GB, you probably don't need sharding yet. You need a better server configuration. But if you are scaling into the terabytes, the architecture above—MySQL 8, ProxySQL, and NVMe-backed KVM instances—is the standard for stability.

Architecture is about trade-offs. You trade complexity for scale. Just make sure the foundation you build on—the VPS and the network—is solid enough to hold the weight.

Need to benchmark your sharding logic? Deploy a high-memory KVM instance on CoolVDS in Oslo. Spin up takes less than 55 seconds.