Platform Scalability¶
Scalability is the ability of a system to handle increased load without degraded performance or reliability. But scalability as a practice is not just about adding resources—it's about understanding your limits, planning before you hit them, and making scaling decisions that don't require heroics.
This page covers how to think about scalability, when to invest in it, and how to build systems that grow with your business instead of blocking it.
What problem this solves¶
Growth is supposed to be good news. But when systems can't handle growth, it becomes a crisis: outages during traffic spikes, degraded user experience, and engineering teams stuck in reactive firefighting instead of building.
Scalability planning solves this by:
- Identifying bottlenecks before they become outages.
- Creating headroom so growth doesn't require emergency work.
- Making scaling decisions intentional, not reactive.
- Balancing cost and capacity appropriately.
The cost of not planning is that scaling becomes crisis management. You find limits during incidents, not during planning.
When to invest in scalability¶
Invest now if:¶
- You're seeing performance degradation during peak traffic.
- Capacity limits have caused outages in the last quarter.
- Business growth is projected to exceed current headroom within 6 months.
- You're launching to a new market or customer segment with different usage patterns.
- On-call is regularly paged for capacity-related issues.
Defer if:¶
- Current headroom is sufficient for the next 12 months.
- Usage patterns are stable and well-understood.
- Other themes (reliability, security) are more pressing.
- You're in a cash-constrained environment and can accept some risk.
The goal is not to scale for hypothetical future load—it's to have enough headroom that growth doesn't become an emergency.
Scaling strategies¶
Vertical scaling¶
Add more resources to existing instances—bigger machines, more memory, faster CPUs.
When to use:
- Quick wins when you have a clear resource bottleneck.
- Simpler architectures where horizontal scaling isn't worth the complexity.
- Databases where horizontal scaling introduces coordination overhead.
Trade-offs:
- Limited by the largest available instance size.
- Often more expensive per unit of capacity at scale.
- Single points of failure unless combined with redundancy.
Horizontal scaling¶
Add more instances of a service and distribute load across them.
When to use:
- Stateless services that can run in parallel.
- When you've hit vertical limits.
- When you need redundancy as well as capacity.
Trade-offs:
- Requires load balancing and session management.
- Introduces coordination complexity for stateful operations.
- Not all systems scale horizontally without architectural changes.
Caching¶
Store frequently accessed data closer to the consumer to reduce load on origin systems.
When to use:
- Read-heavy workloads with cacheable data.
- Reducing latency for geographically distributed users.
- Protecting databases from repetitive queries.
Trade-offs:
- Cache invalidation is hard—stale data is a real risk.
- Adds operational complexity (cache servers, eviction policies).
- Not effective for write-heavy or highly dynamic data.
Asynchronous processing¶
Move work off the critical path by using queues and background workers.
When to use:
- Operations that don't need immediate results (email, notifications, analytics).
- Smoothing out traffic spikes by buffering work.
- Isolating slow operations from user-facing latency.
Trade-offs:
- Adds complexity (message brokers, retry logic, dead-letter handling).
- Introduces eventual consistency—users may not see results immediately.
- Requires monitoring of queue depths and worker health.
Database scaling¶
Techniques specific to data layer bottlenecks.
Options:
- Read replicas: Offload read traffic from the primary database.
- Sharding: Partition data across multiple databases.
- Connection pooling: Reduce connection overhead.
- Query optimization: Sometimes the cheapest scaling is fixing bad queries.
Trade-offs:
- Read replicas introduce replication lag.
- Sharding adds significant complexity and limits some query patterns.
- Architectural changes to databases are often expensive to reverse.
Capacity planning process¶
Step 1 — Understand current capacity¶
You can't plan if you don't know where you are. For each critical system, establish:
- Current load: Requests per second, concurrent users, data volume.
- Current capacity: Maximum load the system can handle before degradation.
- Headroom: How much room you have between current load and capacity (aim for 30–50% for critical systems).
If you don't have these numbers, measuring them is the first investment.
Step 2 — Model growth scenarios¶
Work with business stakeholders to understand expected growth:
- What's the projected user growth for the next 6–12 months?
- Are there planned launches, promotions, or events that will spike traffic?
- What's the worst-case scenario we need to survive?
Don't over-engineer for 10x growth you may never see. Plan for realistic growth plus a buffer.
Step 3 — Identify bottlenecks¶
Not every component needs to scale. Find the limiting factors:
- Which services are closest to capacity?
- What fails first under load testing?
- What does the on-call team page for most frequently?
Focus scaling investment where it will have the most impact.
Step 4 — Plan interventions¶
For each bottleneck, decide:
- What's the scaling strategy (vertical, horizontal, caching, etc.)?
- What's the cost of the intervention?
- When does it need to be done to stay ahead of growth?
- What's the rollback plan if it doesn't work?
Document these decisions in an ADR or capacity planning document.
Step 5 — Monitor and adjust¶
Capacity planning is not a one-time exercise:
- Track usage trends monthly.
- Review capacity quarterly.
- Update plans when growth assumptions change.
Set alerts for when usage approaches thresholds, not just when you've hit them.
Roles and ownership¶
| Role | Responsibilities |
|---|---|
| Platform/Infrastructure Team | Own capacity monitoring, scaling infrastructure, and cost management. Provide tooling for teams to understand their service capacity. |
| Product Teams | Understand the capacity characteristics of their services. Raise concerns when approaching limits. Participate in capacity planning for their domain. |
| Engineering Leadership | Prioritize scalability investments. Ensure capacity planning is part of the roadmap process. Balance capacity spend with other investments. |
| Finance/Ops | Provide growth projections. Partner on cost modeling for scaling options. |
Templates and artifacts¶
Capacity planning document¶
# Capacity Plan: [System/Service]
**Last updated:** [Date]
**Owner:** [Name]
## Current state
| Metric | Current value | Capacity limit | Headroom |
| -------------------- | ------------- | -------------- | -------- |
| Requests/sec | 1,200 | 2,000 | 40% |
| Database connections | 80 | 100 | 20% |
| Queue depth (p99) | 150 | 500 | 70% |
| Memory utilization | 65% | 80% | 15% |
## Growth projection
- **Expected growth:** 50% increase in traffic over next 6 months
- **Peak events:** Black Friday (3x normal), product launch Q2 (2x)
- **Worst case:** 4x normal traffic sustained for 24 hours
## Bottleneck analysis
| Bottleneck | Impact | Urgency |
| -------------------- | ------------------------------------ | ----------------- |
| Database connections | Will hit limit at 25% traffic growth | High—address Q1 |
| Memory utilization | Approaching threshold | Medium—address Q2 |
| Request capacity | Sufficient headroom | Low—monitor |
## Scaling plan
### Database connections (Q1)
**Strategy:** Implement connection pooling + add read replica
**Cost:** $X/month for replica
**Timeline:** 3 weeks
**Owner:** [Name]
**Rollback:** Revert to direct connections if pooler causes issues
### Memory optimization (Q2)
**Strategy:** Right-size instances, optimize memory-heavy operations
**Cost:** Neutral to negative (savings)
**Timeline:** 2 weeks
**Owner:** [Name]
## Monitoring
- Alert when headroom < 20% on any critical metric
- Weekly capacity report to [channel]
- Quarterly capacity review meeting
## Review schedule
- Next review: [Date]
- Trigger for emergency review: Growth exceeds projection by >20%
Load testing checklist¶
# Load Testing: [Service]
**Date:** [Date]
**Owner:** [Name]
## Pre-test
- [ ] Baseline metrics captured
- [ ] Test environment isolated or production-safe
- [ ] Rollback plan documented
- [ ] Stakeholders notified
- [ ] Monitoring dashboards ready
## Test scenarios
- [ ] Steady-state load (normal traffic)
- [ ] Peak load (expected maximum)
- [ ] Spike load (sudden 2x increase)
- [ ] Soak test (sustained load over hours)
## Metrics to capture
- [ ] Latency (p50, p95, p99)
- [ ] Error rate
- [ ] Throughput (requests/sec)
- [ ] Resource utilization (CPU, memory, connections)
- [ ] Queue depths
- [ ] Dependency performance
## Results
| Scenario | Result | Notes |
| ------------ | ----------- | ----- |
| Steady-state | Pass / Fail | |
| Peak load | Pass / Fail | |
| Spike load | Pass / Fail | |
| Soak test | Pass / Fail | |
## Bottlenecks identified
1. [Bottleneck]: [Description]
2. [Bottleneck]: [Description]
## Follow-up actions
- [ ] [Action with owner]
Signals that scalability practices are working¶
| Signal | What it indicates |
|---|---|
| Scaling happens before outages, not during | Proactive planning is working |
| Traffic spikes don't cause pages | Headroom is sufficient |
| Capacity is a roadmap item, not a crisis | Prioritization is effective |
| Teams know their service limits | Ownership and visibility are clear |
| Cost grows slower than traffic | Efficiency improvements are landing |
Failure modes and mitigations¶
| Failure mode | What it looks like | Mitigation |
|---|---|---|
| Over-engineering | Scaling for 100x growth that never comes; high costs for unused capacity | Plan for realistic growth + buffer; review assumptions regularly |
| Under-engineering | Discovering limits during outages; reactive firefighting | Measure current capacity; set alerts before limits |
| Scaling the wrong thing | Expensive investments that don't address the actual bottleneck | Load test to identify real limits; focus on the constraint |
| Ignoring cost | Capacity grows but so does the bill, unsustainably | Include cost in capacity planning; set budgets and monitor spend |
| One-time planning | Plan created and never updated; becomes stale | Quarterly reviews; automated monitoring of headroom |
Related pages¶
- Platform Themes — How scalability fits into the broader platform investment framework.
- Reliability Practices — Scalability and reliability are closely linked.
- Delivery: Metrics in Execution — Operational metrics that include capacity signals.
- Scaling: Scaling Systems — Broader strategies for technical growth.
- Resources: Runbook Template — Include scaling procedures in operational runbooks.