Console Login

Scaling Beyond the Limit: Database Sharding Strategies for High-Traffic Norwegian Workloads

Scaling Beyond the Limit: Database Sharding Strategies for High-Traffic Norwegian Workloads

There is a specific moment of dread every systems architect recognizes. It usually happens at 2:00 AM on a Tuesday. Your primary database node—despite having 32 cores and 128GB of RAM—hits 98% CPU utilization. The I/O wait creeps up, and the slow query log starts growing faster than you can read it. You have optimized every index. You have tuned innodb_buffer_pool_size to the edge of physical memory. You have even moved to NVMe storage. But the monolithic architecture has hit its mathematical ceiling.

It is time to shard.

In the Norwegian hosting market, where data sovereignty (thanks to the recently enforced GDPR) and latency to Oslo are critical, sharding isn't just about performance; it is often about compliance. If you are serving users from Tromsø to Kristiansand, simply throwing more hardware at a single instance is a temporary bandage. Here is how we handle database sharding in production environments in 2018, moving beyond the monolith without destroying data integrity.

The Hard Truth: Vertical vs. Horizontal Scaling

Upgrading your VPS plan (Vertical Scaling) is the easy button. Moving from a standard SSD instance to a high-performance CoolVDS NVMe plan will drastically reduce I/O bottlenecks. For 80% of businesses, this is the correct move. It buys you 12 to 24 months of runway. However, if your write-throughput exceeds what a single disk controller can handle, or if your dataset size makes backups take longer than your maintenance window allows, you must scale horizontally.

Pro Tip: Before you shard, ensure you are actually bound by write-capacity or size. If your issue is read-capacity, set up standard Master-Slave replication first. Sharding adds significant complexity to your application logic. Do not introduce it unless necessary.

Sharding Strategies for 2018 Architectures

Sharding involves splitting your data across multiple database servers (shards) so that each server holds only a fraction of the total dataset. There are three primary patterns we see deployed effectively in Europe right now.

1. Key-Based (Hash) Sharding

This is the most common method for SaaS platforms. You take a unique identifier (like a user_id or customer_uuid), run it through a hash function, and use the result to determine which shard the data lives on.

# Python Pseudo-code logic for Hash Sharding
shard_id = user_id % total_shards
connection_string = f"db_shard_{shard_id}.coolvds.net"
connect(connection_string)

The Trade-off: It distributes load evenly, which is great. However, adding new shards later requires re-balancing the entire cluster, which is a nightmare operation involving massive data migration scripts.

2. Directory-Based Sharding (Lookup Service)

You maintain a separate lookup database that maps a routing key to a specific physical shard. This allows you to move users between shards without changing the sharding logic.

3. Geo-Sharding (The GDPR Compliance Winner)

With the General Data Protection Regulation (GDPR) coming into full force this past May, data locality is paramount. Geo-sharding allows you to store data based on the user's physical location.

For a Norwegian enterprise, you might configure your architecture such that:

  • Shard A (Oslo DC): Stores users with country_code = 'NO'.
  • Shard B (Frankfurt DC): Stores users with country_code = 'DE'.

This keeps Norwegian data within national borders—a massive selling point when dealing with Datatilsynet (The Norwegian Data Protection Authority) and ensuring minimal latency via NIX (Norwegian Internet Exchange).

Technical Implementation: Using ProxySQL

In 2018, we don't need to hardcode sharding logic into the application layer if we don't want to. ProxySQL has emerged as the battle-tested standard for this. It sits between your application and your database nodes, routing queries based on rules you define.

Here is a real-world configuration example for routing write traffic based on sharding keys using ProxySQL 1.4:

-- Define your backend servers (The Shards)
INSERT INTO mysql_servers (hostgroup_id, hostname, port) VALUES (10, '10.0.0.5', 3306); -- Shard 1
INSERT INTO mysql_servers (hostgroup_id, hostname, port) VALUES (20, '10.0.0.6', 3306); -- Shard 2

-- Load them to runtime
LOAD MYSQL SERVERS TO RUNTIME;
SAVE MYSQL SERVERS TO DISK;

-- Create sharding rules (simplified example)
-- Route queries for users 1-1000 to Shard 1 (Hostgroup 10)
INSERT INTO mysql_query_rules (rule_id, active, match_pattern, destination_hostgroup, apply)
VALUES (1, 1, "user_id IN (1...1000)", 10, 1);

-- Route queries for users 1001-2000 to Shard 2 (Hostgroup 20)
INSERT INTO mysql_query_rules (rule_id, active, match_pattern, destination_hostgroup, apply)
VALUES (2, 1, "user_id IN (1001...2000)", 20, 1);

LOAD MYSQL QUERY RULES TO RUNTIME;
SAVE MYSQL QUERY RULES TO DISK;

This abstraction layer allows you to add shards or migrate data in the background without deploying new code to your web servers.

Infrastructure Requirements: Latency Kills

Sharding introduces network overhead. A request might hit your load balancer, then your web server, then ProxySQL, and finally the specific database shard. If you are hosting these components on congested networks or budget VPS providers with noisy neighbors, that overhead accumulates fast.

This is where the underlying virtualization technology matters. We rely strictly on KVM (Kernel-based Virtual Machine) at CoolVDS. Unlike OpenVZ, which shares the host kernel and can suffer from CPU stealing when a neighbor gets busy, KVM provides true hardware isolation. When you are calculating sharding keys across 4 nodes, you need consistent CPU performance, not "burstable" promises.

Optimizing the Host Node

Regardless of your sharding strategy, the individual nodes must be tuned. On a standard Ubuntu 16.04 or 18.04 LTS database node, we recommend the following kernel tweaks in /etc/sysctl.conf to handle the high connection counts typical of sharded environments:

# Increase the number of allowable open files
fs.file-max = 2097152

# Minimize swapping (crucial for MySQL/PostgreSQL performance)
vm.swappiness = 1

# Improve network connections handling
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.tcp_tw_reuse = 1

The ID Generation Problem

When you shard, you lose the ability to use AUTO_INCREMENT safely across the cluster. If Shard A creates User #100 and Shard B creates User #100, you have a collision. You must switch to a global ID generation strategy.

Common solutions widely used this year include:

  1. UUIDs: Easy to implement, but the string length and lack of sequential ordering can fragment InnoDB indexes, hurting write performance on spinning disks (less of an issue on our NVMe storage, but still relevant).
  2. Twitter Snowflake: Generates unique 64-bit integers based on time, machine ID, and sequence number.
  3. Centralized Ticket Server: A small Redis or MySQL instance dedicated solely to handing out incrementing IDs (Flickr popularized this method).

Conclusion: Start with the Right Foundation

Database sharding is complex. It increases your operational overhead and requires a skilled team to manage. However, for high-growth applications targeting the Norwegian and broader European market, it is the standard path to infinite scale.

Before you rewrite your application logic, ensure your infrastructure isn't the actual bottleneck. High-frequency trading algorithms and massive e-commerce shops run on our infrastructure because we prioritize low latency and raw I/O throughput. Don't let slow hardware force you into premature architectural complexity.

Ready to test your sharding logic? Deploy high-performance KVM instances in Oslo on CoolVDS in under 55 seconds.