Console Login

Database Sharding in 2017: Survival Strategies for High-Traffic Norwegian Apps

Database Sharding in 2017: Survival Strategies for High-Traffic Norwegian Apps

It starts with a subtle rise in I/O wait. Then, your replication lag creeps from 20ms to 200ms. Finally, during a traffic spike—maybe a Black Friday sale or a breaking news event in Oslo—your primary writer node locks up. Too many connections. The CPU is idle, but the disk is screaming.

If you represent a growing startup or an enterprise in the Nordics, you have likely maxed out vertical scaling. You have thrown more RAM at the problem, upgraded to the fastest CPUs available, and tuned your my.cnf until it bleeds. But you can't defy physics. When a single writable master cannot handle the write throughput, you enter the painful, complex, yet necessary world of Database Sharding.

I have spent the last month migrating a high-velocity ad-tech platform off a monolithic PostgreSQL instance. The lessons learned were bloody, but valuable. Today, we aren't talking about theoretical computer science. We are talking about how to split your data across multiple nodes without losing your mind, specifically using the tools stable enough for production right now in early 2017.

The Ceiling of Vertical Scaling

Before we cut the database into pieces, let's be honest about why. In a standard Master-Slave setup (which we really should call Primary-Replica), reads are easy to scale. You just add more replicas. Writes are the bottleneck.

In 2017, hardware is fast. A CoolVDS KVM instance with NVMe storage provides insane IOPS compared to the spinning rust of 2012. But even with NVMe, locking contention in MySQL's InnoDB engine or PostgreSQL's MVCC eventually limits how many concurrent writes you can process.

Pro Tip: Before sharding, verify your innodb_io_capacity. On CoolVDS NVMe instances, the default value of 200 is laughable. Crank it up to 2000 or higher to match the underlying hardware capabilities.

Sharding Strategies: Pick Your Poison

Sharding involves horizontal partitioning—splitting rows across different database instances (shards) based on a specific key (Shard Key). There are three main architectural patterns we see in the wild today.

1. Key Based (Hash) Sharding

You take a value like user_id, hash it, and do a modulo operation against the number of shards. This distributes data evenly.


// PHP 7.0 Example Logic
$numberOfShards = 4;
$shardId = $userId % $numberOfShards;
$connectionString = $config['shards'][$shardId];

Pros: Even distribution of data. Hotspots are rare unless one user is abnormally active.
Cons: Resharding is a nightmare. If you go from 4 to 5 shards, the modulo changes, and you have to migrate almost all data.

2. Range Based Sharding

You split data based on value ranges. Users 1-100,000 go to Shard A. Users 100,001-200,000 go to Shard B.

Pros: Easy to implement. Easy to add new shards (just start a new range).
Cons: The "newer" shard takes all the heat. If User 200,001 is active but User 1 is dead, Shard B melts while Shard A sleeps.

3. Directory Based (Lookup) Sharding

You maintain a separate lookup database that maps a user_id to a specific shard. This is the most flexible approach but introduces a single point of failure (the lookup DB).

Implementation: The Middleware Layer

In 2017, application-side sharding (changing your code to pick the DB) is common but messy. It couples your infrastructure to your codebase. A better approach gaining serious traction is using middleware like ProxySQL.

ProxySQL allows you to define query rules that route traffic transparently. Your app thinks it's talking to one DB; ProxySQL sends it to the correct shard.

ProxySQL Configuration Example

Here is how you might configure sharding logic in ProxySQL based on a comment ID. This assumes you are sharding a massive comments table.


-- Define the backend shards (hostgroups)
INSERT INTO mysql_servers (hostgroup_id, hostname, port) VALUES (10, '10.0.0.1', 3306);
INSERT INTO mysql_servers (hostgroup_id, hostname, port) VALUES (20, '10.0.0.2', 3306);

-- Route based on the comment_id being even or odd (simplistic sharding)
INSERT INTO mysql_query_rules (rule_id, active, match_pattern, destination_hostgroup, apply)
VALUES (1, 1, "^SELECT .* FROM comments WHERE id % 2 = 0", 10, 1);

INSERT INTO mysql_query_rules (rule_id, active, match_pattern, destination_hostgroup, apply)
VALUES (2, 1, "^SELECT .* FROM comments WHERE id % 2 = 1", 20, 1);

LOAD MYSQL QUERY RULES TO RUNTIME;
SAVE MYSQL QUERY RULES TO DISK;

This allows you to add shards at the infrastructure level without rewriting your PHP or Python application logic.

The Infrastructure Reality: Latency Kills

Sharding introduces complexity. A single user request might need to query three different shards to assemble a dashboard. This is where network latency becomes the silent killer. If your shards are scattered across cheap VPS providers with poor peering, the aggregate latency will make your app feel sluggish.

This is particularly critical here in Norway. If your primary audience is in Oslo or Bergen, routing traffic through a datacenter in Frankfurt adds 30-40ms round trip. Multiply that by 10 queries per page load, and your site feels broken.

Why CoolVDS Architecture Matters Here

When we architected the CoolVDS network, we focused on two things: KVM Virtualization and NIX (Norwegian Internet Exchange) Peering.

  • KVM: We don't use OpenVZ. In a sharded setup, you need guaranteed resources. OpenVZ "burstable" RAM is a liability when a shard is under load. KVM ensures your kernel is yours.
  • Local Peering: Our latency to major Norwegian ISPs is often under 5ms. When your application server queries Shard A, B, and C simultaneously, that low latency keeps the aggregate response time fast.

Data Sovereignty and the "Datatilsynet" Factor

We are all watching the regulatory landscape shift. While GDPR is technically an EU regulation coming into full force next year (2018), Norwegian companies are already under scrutiny by Datatilsynet. Keeping your user data shards physically located in Norway isn't just about performance; it's about compliance and risk mitigation.

By sharding your data across CoolVDS instances in our Oslo datacenter, you simplify the legal argument. The data never leaves the jurisdiction. It sits on encrypted NVMe storage, protected by Norwegian privacy laws.

Optimizing the MySQL Config for Shards

A shard is smaller than a monolith, but it runs hotter. You need to tune MySQL 5.7 specifically for this. Standard configs are too conservative.


[mysqld]
# Use Barracuda file format for better compression on SSDs
innodb_file_format = Barracuda
innodb_large_prefix = 1

# Essential for data integrity on KVM instances
innodb_flush_log_at_trx_commit = 1
sync_binlog = 1

# Buffer Pool: Set to 70% of your CoolVDS RAM
innodb_buffer_pool_size = 4G 

# Minimize "noisy neighbor" impact on I/O (though less an issue on NVMe)
innodb_io_capacity = 2000
innodb_io_capacity_max = 4000

The Trade-Offs

Sharding is not a silver bullet. You lose ACID transactions across shards (unless you implement complex Two-Phase Commit, which kills performance). You can no longer do simple JOINs across tables that live on different servers. You have to perform the join in your application code.

Feature Monolith Sharded (CoolVDS)
Write Throughput Limited by single node Linearly Scalable
Complexity Low High
Cross-table Joins Easy (SQL) Hard (Application Layer)
Cost High (Big Iron) Moderate (Multiple Small Instances)

Conclusion

If you are pushing the limits of a single server, sharding is your future. It requires a shift in thinking and a robust infrastructure foundation. You cannot build a reliable distributed database on unreliable, oversold hosting.

You need dedicated resources, consistent NVMe I/O performance, and the lowest possible latency to your users. Don't let your database become the bottleneck that kills your growth.

Ready to architect for scale? Spin up a high-performance KVM instance on CoolVDS today and test your sharding logic with 1ms latency to Oslo.