Skip to content

Platform Scalability

Scalability is the ability of a system to handle increased load without degraded performance or reliability. But scalability as a practice is not just about adding resources—it's about understanding your limits, planning before you hit them, and making scaling decisions that don't require heroics.

This page covers how to think about scalability, when to invest in it, and how to build systems that grow with your business instead of blocking it.


What problem this solves

Growth is supposed to be good news. But when systems can't handle growth, it becomes a crisis: outages during traffic spikes, degraded user experience, and engineering teams stuck in reactive firefighting instead of building.

Scalability planning solves this by:

  • Identifying bottlenecks before they become outages.
  • Creating headroom so growth doesn't require emergency work.
  • Making scaling decisions intentional, not reactive.
  • Balancing cost and capacity appropriately.

The cost of not planning is that scaling becomes crisis management. You find limits during incidents, not during planning.


When to invest in scalability

Invest now if:

  • You're seeing performance degradation during peak traffic.
  • Capacity limits have caused outages in the last quarter.
  • Business growth is projected to exceed current headroom within 6 months.
  • You're launching to a new market or customer segment with different usage patterns.
  • On-call is regularly paged for capacity-related issues.

Defer if:

  • Current headroom is sufficient for the next 12 months.
  • Usage patterns are stable and well-understood.
  • Other themes (reliability, security) are more pressing.
  • You're in a cash-constrained environment and can accept some risk.

The goal is not to scale for hypothetical future load—it's to have enough headroom that growth doesn't become an emergency.


Scaling strategies

Vertical scaling

Add more resources to existing instances—bigger machines, more memory, faster CPUs.

When to use:

  • Quick wins when you have a clear resource bottleneck.
  • Simpler architectures where horizontal scaling isn't worth the complexity.
  • Databases where horizontal scaling introduces coordination overhead.

Trade-offs:

  • Limited by the largest available instance size.
  • Often more expensive per unit of capacity at scale.
  • Single points of failure unless combined with redundancy.

Horizontal scaling

Add more instances of a service and distribute load across them.

When to use:

  • Stateless services that can run in parallel.
  • When you've hit vertical limits.
  • When you need redundancy as well as capacity.

Trade-offs:

  • Requires load balancing and session management.
  • Introduces coordination complexity for stateful operations.
  • Not all systems scale horizontally without architectural changes.

Caching

Store frequently accessed data closer to the consumer to reduce load on origin systems.

When to use:

  • Read-heavy workloads with cacheable data.
  • Reducing latency for geographically distributed users.
  • Protecting databases from repetitive queries.

Trade-offs:

  • Cache invalidation is hard—stale data is a real risk.
  • Adds operational complexity (cache servers, eviction policies).
  • Not effective for write-heavy or highly dynamic data.

Asynchronous processing

Move work off the critical path by using queues and background workers.

When to use:

  • Operations that don't need immediate results (email, notifications, analytics).
  • Smoothing out traffic spikes by buffering work.
  • Isolating slow operations from user-facing latency.

Trade-offs:

  • Adds complexity (message brokers, retry logic, dead-letter handling).
  • Introduces eventual consistency—users may not see results immediately.
  • Requires monitoring of queue depths and worker health.

Database scaling

Techniques specific to data layer bottlenecks.

Options:

  • Read replicas: Offload read traffic from the primary database.
  • Sharding: Partition data across multiple databases.
  • Connection pooling: Reduce connection overhead.
  • Query optimization: Sometimes the cheapest scaling is fixing bad queries.

Trade-offs:

  • Read replicas introduce replication lag.
  • Sharding adds significant complexity and limits some query patterns.
  • Architectural changes to databases are often expensive to reverse.

Capacity planning process

Step 1 — Understand current capacity

You can't plan if you don't know where you are. For each critical system, establish:

  • Current load: Requests per second, concurrent users, data volume.
  • Current capacity: Maximum load the system can handle before degradation.
  • Headroom: How much room you have between current load and capacity (aim for 30–50% for critical systems).

If you don't have these numbers, measuring them is the first investment.

Step 2 — Model growth scenarios

Work with business stakeholders to understand expected growth:

  • What's the projected user growth for the next 6–12 months?
  • Are there planned launches, promotions, or events that will spike traffic?
  • What's the worst-case scenario we need to survive?

Don't over-engineer for 10x growth you may never see. Plan for realistic growth plus a buffer.

Step 3 — Identify bottlenecks

Not every component needs to scale. Find the limiting factors:

  • Which services are closest to capacity?
  • What fails first under load testing?
  • What does the on-call team page for most frequently?

Focus scaling investment where it will have the most impact.

Step 4 — Plan interventions

For each bottleneck, decide:

  • What's the scaling strategy (vertical, horizontal, caching, etc.)?
  • What's the cost of the intervention?
  • When does it need to be done to stay ahead of growth?
  • What's the rollback plan if it doesn't work?

Document these decisions in an ADR or capacity planning document.

Step 5 — Monitor and adjust

Capacity planning is not a one-time exercise:

  • Track usage trends monthly.
  • Review capacity quarterly.
  • Update plans when growth assumptions change.

Set alerts for when usage approaches thresholds, not just when you've hit them.


Roles and ownership

Role Responsibilities
Platform/Infrastructure Team Own capacity monitoring, scaling infrastructure, and cost management. Provide tooling for teams to understand their service capacity.
Product Teams Understand the capacity characteristics of their services. Raise concerns when approaching limits. Participate in capacity planning for their domain.
Engineering Leadership Prioritize scalability investments. Ensure capacity planning is part of the roadmap process. Balance capacity spend with other investments.
Finance/Ops Provide growth projections. Partner on cost modeling for scaling options.

Templates and artifacts

Capacity planning document

# Capacity Plan: [System/Service]

**Last updated:** [Date]
**Owner:** [Name]

## Current state

| Metric               | Current value | Capacity limit | Headroom |
| -------------------- | ------------- | -------------- | -------- |
| Requests/sec         | 1,200         | 2,000          | 40%      |
| Database connections | 80            | 100            | 20%      |
| Queue depth (p99)    | 150           | 500            | 70%      |
| Memory utilization   | 65%           | 80%            | 15%      |

## Growth projection

- **Expected growth:** 50% increase in traffic over next 6 months
- **Peak events:** Black Friday (3x normal), product launch Q2 (2x)
- **Worst case:** 4x normal traffic sustained for 24 hours

## Bottleneck analysis

| Bottleneck           | Impact                               | Urgency           |
| -------------------- | ------------------------------------ | ----------------- |
| Database connections | Will hit limit at 25% traffic growth | High—address Q1   |
| Memory utilization   | Approaching threshold                | Medium—address Q2 |
| Request capacity     | Sufficient headroom                  | Low—monitor       |

## Scaling plan

### Database connections (Q1)

**Strategy:** Implement connection pooling + add read replica
**Cost:** $X/month for replica
**Timeline:** 3 weeks
**Owner:** [Name]
**Rollback:** Revert to direct connections if pooler causes issues

### Memory optimization (Q2)

**Strategy:** Right-size instances, optimize memory-heavy operations
**Cost:** Neutral to negative (savings)
**Timeline:** 2 weeks
**Owner:** [Name]

## Monitoring

- Alert when headroom < 20% on any critical metric
- Weekly capacity report to [channel]
- Quarterly capacity review meeting

## Review schedule

- Next review: [Date]
- Trigger for emergency review: Growth exceeds projection by >20%

Load testing checklist

# Load Testing: [Service]

**Date:** [Date]
**Owner:** [Name]

## Pre-test

- [ ] Baseline metrics captured
- [ ] Test environment isolated or production-safe
- [ ] Rollback plan documented
- [ ] Stakeholders notified
- [ ] Monitoring dashboards ready

## Test scenarios

- [ ] Steady-state load (normal traffic)
- [ ] Peak load (expected maximum)
- [ ] Spike load (sudden 2x increase)
- [ ] Soak test (sustained load over hours)

## Metrics to capture

- [ ] Latency (p50, p95, p99)
- [ ] Error rate
- [ ] Throughput (requests/sec)
- [ ] Resource utilization (CPU, memory, connections)
- [ ] Queue depths
- [ ] Dependency performance

## Results

| Scenario     | Result      | Notes |
| ------------ | ----------- | ----- |
| Steady-state | Pass / Fail |       |
| Peak load    | Pass / Fail |       |
| Spike load   | Pass / Fail |       |
| Soak test    | Pass / Fail |       |

## Bottlenecks identified

1. [Bottleneck]: [Description]
2. [Bottleneck]: [Description]

## Follow-up actions

- [ ] [Action with owner]

Signals that scalability practices are working

Signal What it indicates
Scaling happens before outages, not during Proactive planning is working
Traffic spikes don't cause pages Headroom is sufficient
Capacity is a roadmap item, not a crisis Prioritization is effective
Teams know their service limits Ownership and visibility are clear
Cost grows slower than traffic Efficiency improvements are landing

Failure modes and mitigations

Failure mode What it looks like Mitigation
Over-engineering Scaling for 100x growth that never comes; high costs for unused capacity Plan for realistic growth + buffer; review assumptions regularly
Under-engineering Discovering limits during outages; reactive firefighting Measure current capacity; set alerts before limits
Scaling the wrong thing Expensive investments that don't address the actual bottleneck Load test to identify real limits; focus on the constraint
Ignoring cost Capacity grows but so does the bill, unsustainably Include cost in capacity planning; set budgets and monitor spend
One-time planning Plan created and never updated; becomes stale Quarterly reviews; automated monitoring of headroom