Database Sharding Strategies: Surviving Scale When Vertical Upgrades Fail
There is a specific kind of silence that falls over an engineering room when the primary database server hits 98% CPU utilization and the iowait spikes through the roof. It usually happens on a Tuesday morning or, cruelly, during a Black Friday event. You have already maxed out the RAM. You have upgraded to the most expensive CPU tiers available. You have tuned innodb_buffer_pool_size until you are blue in the face. But the connection pool is exhausted, and latency is climbing.
This is the wall. Vertical scaling (upgrading hardware) has a physical limit. If you are operating a high-growth platform in the Nordic market, hitting this wall isn't a possibility; it's an inevitability.
The solution is not more hardware; it is better architecture. It is time to shatter the monolith. It is time to shard.
The Brutal Reality of Sharding
Let's get one thing clear: sharding is complex. It replaces a single point of failure with a distributed system that introduces network latency and consistency challenges. Do not shard unless you have data exceeding 2TB or write operations exceeding 5,000/sec consistently. However, when you do need it, you need infrastructure that can handle the increased chatter between nodes.
In 2023, with tools like Vitess for MySQL or Citus for PostgreSQL, the application layer abstraction has improved, but the underlying metal matters more than ever.
1. The Lookup Strategy (Directory-Based Sharding)
I once worked on a SaaS platform serving the Norwegian health sector. Data locality was critical due to GDPR and specific Datatilsynet requirements. We couldn't just hash user IDs randomly across servers; we needed to know exactly where Tenant A's data lived versus Tenant B's.
We used a Lookup Service. The application queries a lightweight database to find which physical shard holds the data.
Pros: immense flexibility. You can move shards without changing the sharding key logic.
Cons: The lookup DB becomes a single point of failure and a performance bottleneck.
To mitigate this, we cached the lookup map heavily in Redis, backed by NVMe storage on the persistence layer. On a standard HDD VPS, the lookup latency added 15ms. On CoolVDS NVMe instances, the disk seek time was negligible, keeping total overhead under 2ms.
2. Key-Based Sharding (Algorithmic)
This is the standard approach for massive B2C apps. You take a value (like user_id), hash it, and modulo the result by the number of database nodes.
def get_shard_id(user_id, total_shards):
# Simple consistent hashing logic
return hash(user_id) % total_shards
The infrastructure requirement here is uniformity. If Shard 1 is on a high-performance node and Shard 2 is on a legacy node, your application is only as fast as Shard 2. This is why we standardize our CoolVDS nodes on KVM virtualization with guaranteed CPU cycles—noisy neighbors on a shared host will cause "hot shard" issues that are nightmare to debug.
Technical Implementation: MySQL & ProxySQL
You don't want to rewrite your entire codebase to handle database connections. Enter ProxySQL. It sits between your app and your DB cluster, routing queries based on rules.
Here is how we configured ProxySQL in a recent deployment to split writes based on sharding keys defined in query comments.
-- Define hostgroups (Shards)
INSERT INTO mysql_replication_hostgroups (writer_hostgroup,reader_hostgroup,comment)
VALUES (10, 20, 'Shard-01');
INSERT INTO mysql_replication_hostgroups (writer_hostgroup,reader_hostgroup,comment)
VALUES (30, 40, 'Shard-02');
-- Route traffic based on SQL comments injected by the app
INSERT INTO mysql_query_rules (rule_id, active, match_pattern, destination_hostgroup, apply)
VALUES (1, 1, "^SELECT.*-- sharding_key=100", 10, 1);
LOAD MYSQL QUERY RULES TO RUNTIME;
SAVE MYSQL QUERY RULES TO DISK;
This setup allows the application to remain relatively dumb, while the infrastructure layer handles the complexity. However, ProxySQL introduces a hop. If your VPS network driver is virtualized poorly (like basic VirtIO without tuning), you lose packets. We optimize the network stack on CoolVDS to ensure these internal hops happen at near-wire speed.
PostgreSQL: The Citus Approach
If you are in the PostgreSQL camp (and with PostgreSQL 16 recently released, many are), Citus is the extension of choice. It transforms Postgres into a distributed database.
Setting up a distributed table involves designating a coordinator node and worker nodes.
-- On the Coordinator Node
CREATE EXTENSION citus;
-- Add worker nodes (CoolVDS instances with private networking)
SELECT * from master_add_node('10.0.0.2', 5432);
SELECT * from master_add_node('10.0.0.3', 5432);
-- Distribute the table
CREATE TABLE sensor_data (
device_id bigint,
temperature float,
recorded_at timestamp
);
-- Shard by device_id
SELECT create_distributed_table('sensor_data', 'device_id');
Pro Tip: When running Citus on VPS infrastructure, ensure your provider supports low-latency private networking. Sharded JOIN operations require massive data shuffling between nodes. If that traffic hits the public internet or a throttled VLAN, your query performance will tank. CoolVDS offers unmetered internal traffic specifically for this reason.
Infrastructure Tuning for Shards
Regardless of the strategy, the node configuration is vital. A shard is smaller than a monolith, but it must be faster. Since 2022, we have seen a shift towards NVMe-only architectures for database shards.
Here is a snippet of a my.cnf (MySQL 8.0) optimized for a high-performance shard node running on a CoolVDS 8GB RAM instance:
[mysqld]
# INNODB SETTINGS
# Set to 70-80% of available RAM
innodb_buffer_pool_size = 6G
innodb_log_file_size = 1G
innodb_flush_log_at_trx_commit = 1 # ACID compliance is mandatory
innodb_flush_method = O_DIRECT
# CONNECTION SETTINGS
max_connections = 500
thread_cache_size = 50
# I/O TUNING FOR NVMe
innodb_io_capacity = 2000
innodb_io_capacity_max = 4000
innodb_read_io_threads = 8
innodb_write_io_threads = 8
Note the innodb_io_capacity. Default values in MySQL are often set for spinning disks (HDD). If you don't crank this up on an NVMe drive, you are driving a Ferrari in first gear.
The Nordic Latency Advantage
Why does geography matter in sharding? Because of the speed of light. If your application servers are in Oslo, but your database shards are scattered across cheap hosting in Frankfurt or Amsterdam, the round-trip time (RTT) kills the benefits of parallel processing.
For a Norwegian user base, hosting your shards in Norway reduces latency from ~25ms (Oslo-Frankfurt) to <5ms (Oslo-Oslo). In a sharded query that requires aggregating data from 4 nodes, that latency stacks up.
Furthermore, the legal landscape in 2023 requires strict adherence to data sovereignty. Sharding allows you to isolate Norwegian user data onto physical nodes located within national borders, simplifying GDPR audits.
Conclusion
Sharding is not a magic bullet; it is a surgical procedure for systems that have outgrown their skeleton. It requires precise planning, robust SQL routing, and, most critically, infrastructure that respects the laws of physics and I/O performance.
Don't let slow I/O or network jitter undermine your architecture. If you are building the next big platform in the Nordics, you need a foundation that is as battle-ready as your code.
Ready to architect for scale? Deploy your database cluster on CoolVDS NVMe instances today and experience the difference pure performance makes.