Console Login

Database Sharding Architectures: Survival Strategies for High-Throughput Systems

Database Sharding Architectures: Survival Strategies for High-Throughput Systems

There is a specific kind of silence that falls over an engineering room when the primary database CPU hits 100% and stays there. It’s not peaceful. It’s the sound of lost revenue. I experienced this firsthand last year managing the backend for a major Nordic retail platform during a flash sale. We had optimized every index, tuned innodb_buffer_pool_size to the edge of physical RAM, and cached aggressively with Redis. It didn't matter. The write locks were choking us.

We hit the "Vertical Wall." You can only buy a server with so much RAM and so many cores before the economics—and physics—stop making sense. If you are reading this, you are likely staring at that same wall.

The solution is sharding. Breaking your monolithic database into smaller, faster, manageable chunks. But sharding is not a silver bullet; it is complexity insurance. If you implement it poorly, you trade write bottlenecks for network latency and data inconsistency hell. Here is how we handle database sharding for high-scale production environments, focusing on the realities of 2019 technology stacks.

The Latency Tax: Why Infrastructure Matters

Before writing a single line of config, understand this: Sharding converts local function calls into network calls.

In a monolithic setup, joining tables is a memory operation. In a sharded setup, it often involves cross-node network traffic. If your shards are hosted on overloaded virtual machines with "noisy neighbors," your query times will fluctuate wildly. This is why we default to KVM-based virtualization at CoolVDS. Containers are great for stateless apps, but for persistent data stores, the isolation overhead and potential for I/O contention in shared container environments are risks I refuse to take.

Furthermore, if your users are in Norway, your shards need to be in Norway. Round-trip time (RTT) from Oslo to Frankfurt is roughly 15-20ms. That doesn't sound like much until you have a complex transaction hitting four different shards. Suddenly, your application feels sluggish. Hosting locally in Oslo cuts that RTT to sub-millisecond levels.

Strategy 1: Application-Level Sharding (The "Do It Yourself" Approach)

The most straightforward way to shard is to handle the logic within your application code. You determine which shard to write to based on a Shard Key (e.g., user_id or customer_id).

Here is a simplified logic flow in PHP for a directory-based shard map:


class ShardManager {
    private $shards = [
        'shard_01' => ['host' => '10.0.0.10', 'db' => 'users_01'],
        'shard_02' => ['host' => '10.0.0.11', 'db' => 'users_02'],
    ];

    public function getConnection($userId) {
        // Simple modulo hashing
        $shardIndex = $userId % count($this->shards);
        $target = array_values($this->shards)[$shardIndex];
        
        return new PDO(
            "mysql:host={$target['host']};dbname={$target['db']}", 
            'app_user', 
            'secure_password'
        );
    }
}

The Pros: absolute control. You know exactly where data lives.
The Cons: Migrating data (resharding) is a nightmare. If shard_01 fills up, you have to manually move users to a new shard_03 and update your hashing algorithm without downtime.

Strategy 2: Middleware Routing (ProxySQL)

For MySQL workloads, I strongly advocate against putting routing logic in the app. It makes code deployment risky. Instead, use ProxySQL. It sits between your app and your database nodes, parsing SQL traffic and routing it based on rules you define.

This allows you to add shards transparently. The application thinks it is talking to one giant MySQL server.

Configuration Example

First, we define our hostgroups (shards) in the ProxySQL admin interface:


-- Hostgroup 10 is Shard 1
INSERT INTO mysql_servers (hostgroup_id, hostname, port) VALUES (10, '10.0.0.10', 3306);
-- Hostgroup 20 is Shard 2
INSERT INTO mysql_servers (hostgroup_id, hostname, port) VALUES (20, '10.0.0.11', 3306);

LOAD MYSQL SERVERS TO RUNTIME;
SAVE MYSQL SERVERS TO DISK;

Next, we create sharding rules. Assuming we shard by user_id and utilize a commenting convention in our SQL queries (a common trick in 2019 to hint routing):


-- Route based on query comments injected by the app
INSERT INTO mysql_query_rules (rule_id, active, match_pattern, destination_hostgroup, apply)
VALUES (1, 1, "/\* sharding_id:1 \*/", 10, 1);

INSERT INTO mysql_query_rules (rule_id, active, match_pattern, destination_hostgroup, apply)
VALUES (2, 1, "/\* sharding_id:2 \*/", 20, 1);

LOAD MYSQL QUERY RULES TO RUNTIME;
SAVE MYSQL QUERY RULES TO DISK;

With this setup, you can move data between shards in the background and simply update the ProxySQL rules. Zero code changes required.

Pro Tip: When running database shards, I/O is your bottleneck. On CoolVDS, we provision NVMe storage by default. Do not attempt to run a high-throughput sharded setup on spinning rust (HDD) or standard SSDs if you expect heavy write loads. The IOPS wait will kill your locking performance.

The PostgreSQL Approach: Citus (or Declarative Partitioning)

If you are in the PostgreSQL camp (and with Postgres 12 just released this month, it's a great time to be there), you have different options. While Postgres 10 introduced native declarative partitioning, it handles partitioning on a single instance. For multi-node sharding, Citus is the industry standard extension.

Citus transforms Postgres into a distributed database. You define a "coordinator" node and "worker" nodes.


-- On the coordinator node
CREATE EXTENSION citus;

-- Add worker nodes (CoolVDS instances)
SELECT * from master_add_node('10.0.0.10', 5432);
SELECT * from master_add_node('10.0.0.11', 5432);

-- Distribute the table
SELECT create_distributed_table('orders', 'company_id');

Citus handles the query routing automatically. It pushes down the computation to the workers, aggregates the results, and returns them. It turns your cluster of 4GB RAM VPS instances into a virtual 128GB RAM beast.

Data Sovereignty and Compliance

Operating in Norway brings specific legal obligations. With the GDPR fully enforced, you are responsible for where your data physically sits. Using US-based cloud giants often involves opaque data replication across zones that might drift outside the EEA.

By using CoolVDS, you ensure that every shard—every byte of customer data—resides physically on servers in Oslo. This simplifies your compliance posture significantly when auditing time comes around. The Norwegian Datatilsynet is not known for leniency regarding lack of control over data flows.

Infrastructure Checklist for 2019

If you are deploying a sharded architecture today, ensure your hosting environment ticks these boxes:

Requirement Why it is critical CoolVDS Standard
Private Networking Shards communicate constantly. Public IP traffic introduces latency and security risks. Free private VLANs
NVMe Storage Database sharding creates random I/O patterns. SATA SSDs often choke. Native NVMe
KVM Virtualization Prevents "CPU Steal" from neighbors affecting your query execution time. 100% KVM

Conclusion

Sharding is complex, but for growing platforms, it is inevitable. The key to surviving the transition is rigorous testing and solid underlying infrastructure. You cannot code your way out of slow disk I/O or network jitter.

Start small. Spin up three small instances. Configure ProxySQL or Citus. benchmark the latency. If you need a sandbox that mimics production hardware without the enterprise price tag, we have optimized our stack specifically for these high-IOPS workloads.

Don't let slow I/O kill your SEO or your sales. Deploy a test sharded cluster on CoolVDS in 55 seconds.