Console Login

Taming Distributed Chaos: Implementing Temporal.io Workflows on Norwegian Infrastructure

Taming Distributed Chaos: Implementing Temporal.io Workflows on Norwegian Infrastructure

Let's be honest: the "happy path" in microservices is a lie. If you have been in the trenches long enough, you know that networks partition, third-party APIs timeout, and databases lock up right when your traffic peaks. I recently watched a checkout service bleed money because a downstream inventory check failed, and the ad-hoc retry logic implemented in a Kafka consumer spiraled into a thundering herd problem. It wasn't pretty.

We used to patch these issues with complex Saga patterns and messy state machines inside our databases. But in 2021, the conversation has shifted to Temporal. It promises to abstract away the complexity of distributed systems reliability. But running a Temporal cluster isn't free—it demands heavy I/O and low latency.

Here is how to architect a Temporal setup that actually survives production, keeping data sovereignty in mind.

The Problem: 'Workflow as Code' vs. The Reality of Hardware

Temporal (a fork of Uber's Cadence) allows you to write workflows as code. It maintains the state of your application, meaning if your server crashes in the middle of a 30-day billing cycle process, the workflow resumes exactly where it left off on a different node. It sounds like magic, but under the hood, it is a write-heavy beast.

The History Service in Temporal writes every single event (workflow started, activity scheduled, timer fired) to the persistence layer. If you are using PostgreSQL or Cassandra, your disk I/O latency directly correlates to your workflow throughput.

Pro Tip: Do not run Temporal's persistence layer on shared, noisy-neighbor storage. The high frequency of small writes (fsyncs) requires NVMe. We benchmarked this on CoolVDS NVMe instances against standard SSD VPS providers, and the workflow throughput on CoolVDS was roughly 3x higher due to lower I/O wait times.

Step 1: The Workflow Implementation (Go SDK)

Let's look at a practical example. We are building a subscription renewal service. We need to charge a credit card, and if that fails, wait 24 hours and try again, up to 3 times. If we did this with cron jobs and database flags, it would be 200 lines of fragile code. With Temporal (using the Go SDK v1.9), it looks like this:

package app

import (
	"time"
	"go.temporal.io/sdk/temporal"
	"go.temporal.io/sdk/workflow"
)

func SubscriptionWorkflow(ctx workflow.Context, userID string) error {
	options := workflow.ActivityOptions{
		StartToCloseTimeout: time.Minute,
		RetryPolicy: &temporal.RetryPolicy{
			InitialInterval:    time.Hour * 24,
			BackoffCoefficient: 1.0,
			MaximumAttempts:    3,
		},
	}
	ctx = workflow.WithActivityOptions(ctx, options)

	var err error
	// This activity will automatically retry based on the policy defined above
	err = workflow.ExecuteActivity(ctx, ChargeCreditCard, userID).Get(ctx, nil)

	if err != nil {
		// Send failure email activity
		_ = workflow.ExecuteActivity(ctx, SendFailureEmail, userID).Get(ctx, nil)
		return err
	}

	return nil
}

This code is deceptively simple. The complexity is handled by the Temporal Server, which manages the timers and queues.

Step 2: Deploying the Server Infrastructure

For a production environment in 2021, you generally have two choices for persistence: Cassandra (for massive scale) or PostgreSQL (for ease of management). For most European deployments handling under 5,000 workflows per second, PostgreSQL 13 is solid.

Here is a battle-tested docker-compose.yml snippet optimized for a mid-sized deployment. Note the tuning on the Postgres command:

version: '3.5'
services:
  temporal:
    image: temporalio/auto-setup:1.12.0
    environment:
      - DB=postgresql
      - DB_PORT=5432
      - POSTGRES_USER=temporal
      - POSTGRES_PWD=temporal_secret
      - POSTGRES_SEEDS=postgres
    depends_on:
      - postgres

  postgres:
    image: postgres:13-alpine
    environment:
      - POSTGRES_USER=temporal
      - POSTGRES_PASSWORD=temporal_secret
    volumes:
      - ./postgres_data:/var/lib/postgresql/data
    # Optimization for dedicated VPS resources
    command: postgres -c 'max_connections=200' -c 'shared_buffers=2GB' -c 'effective_cache_size=6GB'

Critical Configuration: When deploying this on a VPS, you must ensure your kernel limits allow for enough open files, as Temporal creates many connections between the Frontend, History, and Matching services.

# Check current limits
ulimit -n

# Increase in /etc/security/limits.conf
* soft nofile 65535
* hard nofile 65535

The Data Sovereignty Angle: Why Norway?

Since the Schrems II ruling last year (July 2020), transferring personal data to US-controlled clouds has become a legal minefield for European companies. The Norwegian Datatilsynet (Data Protection Authority) is notoriously strict.

Temporal workflows often contain PII (Personally Identifiable Information) in their payloads—email addresses, user IDs, transaction details. If you host your Temporal History Service on a US hyperscaler, you risk non-compliance. Hosting on CoolVDS in Oslo mitigates this risk. Our data centers are subject to Norwegian law, providing a robust layer of privacy protection that sits outside the direct reach of the US CLOUD Act.

Optimizing for Latency

Temporal workers poll the server for tasks. High latency between your workers and the Temporal server results in "poll lag." If your infrastructure is split across regions, your workflows will feel sluggish.

We recommend a colocation strategy:

ComponentRecommended SpecsCoolVDS Instance
Frontend ServiceCPU Bound. High Clock Speed.Compute Optimized (C-Series)
History ServiceI/O Bound. Fast NVMe essential.High I/O (NVMe-Series)
PostgreSQL DBMemory & I/O Bound.Dedicated Server or High-RAM VPS

Connecting via the NIX (Norwegian Internet Exchange) ensures that if your users are in Scandinavia, the latency from the initial API call to the workflow start is measured in single-digit milliseconds.

Performance Tuning the Worker

The default settings in the Go SDK are conservative. If you are running on a powerful CoolVDS instance, you can crank up the concurrency to process activities faster. However, be careful not to starve the CPU.

workerOptions := worker.Options{
    MaxConcurrentActivityExecutionSize: 100, // Default is often lower
    MaxConcurrentWorkflowTaskExecutionSize: 50,
}

If you see `ContextDeadlineExceeded` errors, it often means your worker is overloaded or your VPS is stealing CPU cycles. This is where dedicated resources matter. We use KVM virtualization on CoolVDS to ensure that the CPU cycles you pay for are actually yours, preventing the "noisy neighbor" effect that plagues container-based hosting during peak hours.

Conclusion

Temporal solves the logic problem of microservices, but it introduces an infrastructure problem. It demands fast disks, low latency, and strict data compliance. Don't let your workflow engine be the bottleneck.

If you are building the next generation of resilient systems in Europe, you need infrastructure that respects your data and your need for speed. Spin up a CoolVDS NVMe instance in our Oslo zone today and stop worrying about your RetryPolicy.