Console Login

Scaling Beyond the Monolith: Practical Database Sharding Strategies for High-Traffic Systems

Scaling Beyond the Monolith: Practical Database Sharding Strategies

There is a specific kind of silence that falls over a DevOps room when the primary database CPU pins at 100% and stays there. It’s not peaceful. It’s the sound of the monolith hitting the wall. I’ve been there during a Black Friday sale for a Norwegian e-commerce giant, watching queries pile up like snow on the E6 highway.

We threw RAM at it. We upgraded to the most expensive CPUs available. It bought us three months. Then the latency crept back up, from 50ms to 200ms, until timeouts started killing the checkout process. Vertical scaling (scaling up) is a trap. It feels safe because you don’t have to change your application logic, but eventually, you run out of physics. You can’t buy a processor with infinite cores.

The only way out is horizontal scaling. Sharding.

Sharding isn’t a silver bullet. It’s complex, it breaks joins, and it makes transactions a nightmare. But if you want to handle terabytes of data with sub-millisecond response times, it is the only path forward. Here is how we architect it without losing our minds (or our data).

The Architecture of the Split

Sharding involves breaking your large database into smaller, faster, more manageable pieces called shards. Each shard holds a subset of the data. All shards share the same schema, but the data rows are unique to each shard.

Before you even touch a config file, you must decide on a Sharding Strategy. This decision is usually permanent. Changing it later requires a massive migration.

1. Key-Based (Hash) Sharding

This is the most common method for ensuring even distribution. You take a value (like a user_id or customer_uuid), run it through a hash function, and use the result to determine which shard the data lives on.

shard_id = hash(routing_key) % total_shards

This writes data evenly across your nodes, preventing "hotspots." However, adding new shards later requires rebalancing keys, which is heavy on I/O. Consistent hashing algorithms mitigate this, but complexity increases.

2. Range-Based Sharding

Here, you shard based on ranges of values. IDs 1–1,000,000 go to Shard A. IDs 1,000,001–2,000,000 go to Shard B. This is intuitive and makes range queries efficient since likely adjacent data is on the same physical disk.

The Trap: If your application creates sequential IDs (like an auto-incrementing primary key), all your write traffic will hit the last shard. You create a write hotspot, effectively negating the benefit of sharding. I once saw a logging cluster meltdown because they sharded by timestamp—every current write hit the same node.

3. Directory-Based Sharding

You maintain a lookup table service that tracks exactly which shard holds which data. It’s flexible—you can move data without changing keys—but the lookup service becomes a single point of failure and adds latency to every query.

Technical Implementation: The Proxy Layer

In 2022, we rarely code sharding logic directly into the application (unless you enjoy rewriting your ORM layer). We use middleware. For MySQL, ProxySQL is the standard. For PostgreSQL, we often look at Citus or native partitioning features introduced in Postgres 10 and refined in 14.

Let's look at a ProxySQL configuration example. We want to route traffic based on `user_id`. We define hostgroups for our shards.

INSERT INTO mysql_servers (hostgroup_id, hostname, port)
VALUES 
(10, '10.0.0.1', 3306), -- Shard 1
(20, '10.0.0.2', 3306); -- Shard 2

LOAD MYSQL SERVERS TO RUNTIME;
SAVE MYSQL SERVERS TO DISK;

Then, we configure query rules. This is where the magic happens. We can use regex to inspect incoming SQL and route it.

INSERT INTO mysql_query_rules (rule_id, active, match_pattern, destination_hostgroup, apply)
VALUES
(1, 1, "^SELECT .* FROM users WHERE user_id % 2 = 0", 10, 1),
(2, 1, "^SELECT .* FROM users WHERE user_id % 2 = 1", 20, 1);

LOAD MYSQL QUERY RULES TO RUNTIME;
SAVE MYSQL QUERY RULES TO DISK;

Note: This is a simplified modulo example. In production, you would use sharding keys mapped via `mysql_query_rules_fast_routing` or external scripts for better distribution.

The Infrastructure Reality: Latency Kills

When you shard, you introduce network hops. A single logical request might fan out to three different database nodes. If those nodes are on slow networks or crowded hosts, your application crawls.

Pro Tip: Never shard across different data centers for a synchronous transaction application. The speed of light is too slow. If your app servers are in Oslo, your database shards must be in Oslo.

This is where the underlying hardware of your VPS matters. You cannot run a sharded database on a standard shared hosting plan where "noisy neighbors" steal CPU cycles. You need isolation.

Kernel Tuning for Sharded Nodes

On your database nodes (we run these on Ubuntu 20.04 LTS), you must optimize the TCP stack to handle the increased inter-node communication. Default Linux settings are too conservative for high-throughput sharding.

Edit /etc/sysctl.conf:

# Increase the range of ephemeral ports
net.ipv4.ip_local_port_range = 1024 65535

# Fast recycling of TIME_WAIT sockets
net.ipv4.tcp_tw_reuse = 1

# Max backlog for incoming connections
net.core.somaxconn = 4096

# Increase max open files (essential for sharding)
fs.file-max = 500000

Apply with sysctl -p. Don't forget to check your ulimit -n settings as well. A sharded database opens thousands of file descriptors.

PostgreSQL: Native Partitioning

If you are on the PostgreSQL camp (we are running version 14 here), declarative partitioning is robust. It's not full sharding across servers out-of-the-box without extensions like Citus, but it's the first step: partitioning a table across disks or logical spaces.

CREATE TABLE measurements (
    city_id         int not null,
    logdate         date not null,
    peaktemp        int,
    unitsales       int
) PARTITION BY RANGE (logdate);

CREATE TABLE measurements_y2022m01 PARTITION OF measurements
    FOR VALUES FROM ('2022-01-01') TO ('2022-02-01');

CREATE TABLE measurements_y2022m02 PARTITION OF measurements
    FOR VALUES FROM ('2022-02-01') TO ('2022-03-01');

This allows the query planner to "prune" partitions. If you query for February data, Postgres won't even look at the January file. Combine this with NVMe storage, and I/O wait times virtually disappear.

Compliance and Data Residency

We operate in Europe. We cannot ignore GDPR or the fallout from Schrems II. One often overlooked benefit of sharding is Geo-Partitioning. You can configure your sharding key such that all data belonging to Norwegian users (country_code='NO') lands physically on shards located in Oslo, while German data stays in Frankfurt.

This is huge for compliance. It ensures data residency requirements are met at the infrastructure level, not just via application code policies.

Why We Build on CoolVDS

I don't believe in magic infrastructure, but I believe in raw specs. When we deploy sharded clusters, we use CoolVDS for two reasons: KVM and NVMe.

Container-based VPS (like OpenVZ) often share the kernel and disk I/O queues. In a sharded setup, if one shard lags due to a neighbor's heavy I/O, the entire query waits. That tail latency kills the user experience. CoolVDS uses KVM virtualization, providing true kernel isolation.

Furthermore, sharding is I/O intensive. Database rebalancing involves moving gigabytes of data between nodes. Doing this on spinning rust (HDD) or even standard SATA SSDs is painful. The NVMe storage on CoolVDS allows us to rebalance shards without taking the application offline.

The Verdict

Sharding is a commitment. It requires maintenance, monitoring, and a solid DevOps culture. But when your user base grows past the limits of a single server, it is the only architecture that scales linearly.

Start small. Split your heaviest table first. Use ProxySQL or a similar router to abstract the complexity. And ensure your underlying metal is up to the task.

If you are planning a sharded architecture, latency and stability are your new gods. Deploy your test shards on infrastructure that respects them. Spin up a KVM-based, NVMe-powered instance on CoolVDS in under a minute and see how the network stack handles the load.