Console Login

Surviving the Write Cliff: Database Sharding Strategies for High-Scale Systems

Surviving the Write Cliff: Database Sharding Strategies for High-Scale Systems

Your master database just hit 80% CPU utilization. You upgraded the RAM last month. You switched to NVMe storage last week. But the write queries keep piling up, and replication lag is creeping from milliseconds into seconds. Welcome to the "Write Cliff."

In the Nordic hosting market, we often solve performance issues by throwing hardware at them. It's the pragmatic approach. But there is a physical limit to vertical scaling. You can't just buy a bigger server forever. If you are running a high-traffic SaaS or an e-commerce platform targeting the European market, you eventually have to face the inevitable beast: Sharding.

As of June 2018, with GDPR freshly enforced as of last month, sharding isn't just about performance anymore—it's about compliance. Let's dig into how to shard your architecture without destroying your data integrity.

The "Oh S**t" Moment: When to Shard

Sharding is complex. It introduces operational overhead that can crush a small team. Do not shard if your dataset is under 1TB or if you can handle the load with a well-tuned Read Replica setup. You shard when:

  • Write Throughput Saturation: A single master cannot persist data fast enough.
  • Storage Limits: Your dataset exceeds the capacity of a single node's storage (even with our large CoolVDS NVMe volumes).
  • Geographic Compliance: You need to store Norwegian users' data physically in Oslo to satisfy strict interpretation of local data residency requirements.

Strategy 1: Application-Level Sharding (The "Hard" Way)

This is the most common approach for teams moving from a monolith. Your application logic decides which database server to connect to. The standard method is Directory-Based Sharding or Hash-Based Sharding.

The Lookup Table Approach

You maintain a lightweight database that maps entities (like `TenantID` or `UserID`) to a specific Shard ID.


// PHP 7.2 Example: Simple Shard Router
class ShardRouter {
    private $lookupDB;

    public function __construct($pdo) {
        $this->lookupDB = $pdo;
    }

    public function getShardConnection($userId) {
        // Fetch the shard host for this user
        $stmt = $this->lookupDB->prepare("SELECT shard_host FROM user_shards WHERE user_id = ?");
        $stmt->execute([$userId]);
        $host = $stmt->fetchColumn();

        if (!$host) {
            throw new Exception("User not assigned to a shard.");
        }

        // Connect to the specific CoolVDS instance
        $dsn = "mysql:host=$host;dbname=app_data;charset=utf8mb4";
        return new PDO($dsn, 'user', 'secure_pass');
    }
}

The Problem: The Lookup DB becomes a single point of failure (SPOF). You need to cache this aggressively (Redis/Memcached) and replicate the lookup DB itself.

Strategy 2: Middleware Sharding with ProxySQL

In 2018, putting logic in your app is becoming less popular compared to intelligent middleware. ProxySQL is the battle-tested standard here. It sits between your app and your database nodes, speaking the MySQL protocol.

This allows you to shard transparently. Your app thinks it's talking to one DB, but ProxySQL routes the query based on rules.

Pro Tip: When using ProxySQL on CoolVDS instances, ensure you enable SO_REUSEPORT in the config to maximize throughput on multi-core KVM instances. We see a 20% drop in latency just by tuning the network stack correctly.

Configuration for Sharding

Here is how you might configure `proxysql.cnf` to shard traffic based on a `shard_id` comment in your SQL queries (a common 2018 pattern).


# proxysql.cnf snippet

mysql_query_rules:
(
    {
        rule_id=10
        active=1
        match_pattern="/\* shard=1 \*/"
        destination_hostgroup=10
        apply=1
    },
    {
        rule_id=20
        active=1
        match_pattern="/\* shard=2 \*/"
        destination_hostgroup=20
        apply=1
    }
)

Your developers simply append a comment to their queries: SELECT * FROM orders WHERE user_id=55 /* shard=1 */. ProxySQL handles the rest. This keeps your application code relatively clean.

Infrastructure Matters: The Latency Killer

Sharding increases network chatter. A query might hit a router, then a shard, then return. If your shards are hosted on oversold hardware with "noisy neighbors" stealing CPU cycles, your p99 latency will spike unpredictably.

This is where the underlying virtualization technology matters. At CoolVDS, we strictly use KVM (Kernel-based Virtual Machine). Unlike OpenVZ or LXC containers often found in budget VPS providers, KVM provides true hardware isolation. When you run a sharded MySQL cluster, you need consistent I/O performance.

Optimizing MySQL 5.7 for NVMe Shards

If you are deploying on our NVMe-backed instances, the default MySQL 5.7 settings are too conservative. You need to tell InnoDB that it can push the disk harder.


# /etc/mysql/my.cnf

[mysqld]
# Ensure unique ID for replication/sharding
server-id = 101 

# Utilize the speed of CoolVDS NVMe
innodb_io_capacity = 2000
innodb_io_capacity_max = 4000
innodb_flush_method = O_DIRECT

# durability vs performance trade-off
# Set to 2 if you can tolerate 1 second of data loss for massive write speed gain
innodb_flush_log_at_trx_commit = 1 

# Buffer pool should be 70-80% of your VPS RAM
innodb_buffer_pool_size = 12G

The GDPR Angle: Geo-Sharding

With the recent enforcement of GDPR (May 2018), data sovereignty is critical. Norwegian businesses are particularly sensitive to data leaving the EEA.

You can design your sharding key based on CountryCode.
Shard A (Oslo DC): Users with `country=NO`.
Shard B (Frankfurt DC): Users with `country=DE`.

This architecture simplifies compliance audits. If Datatilsynet knocks on your door asking where Norwegian user data lives, you can point to a specific set of IPs physically located in Norway, rather than a nebulous "cloud bucket."

Conclusion

Sharding is a double-edged sword. It solves scaling limits but adds complexity to backups, schema changes, and reporting. Before you shard, optimize your queries and maximize your hardware. But when the time comes, don't build a fragile house of cards.

Use robust middleware like ProxySQL, enforce strict data locality for GDPR, and host on infrastructure that guarantees IOPS. Low latency isn't a luxury; it's a requirement for distributed databases.

Ready to architect your cluster? Deploy a high-performance KVM instance in our Oslo datacenter in under 55 seconds.