Scalability¶
Overview¶
Scalability is the ability to handle growth—more users, data, and regions—without proportional loss of performance or reliability. It spans vertical (bigger machines) and horizontal (more machines).
Why This Exists¶
Products rarely fail from lack of features alone; they fail from outages, cost explosions, or latency regressions under load.
How It Works¶
Dimensions: scale up/out, partitioning, replication, caching, stateless tiers, rate limiting, autoscaling, multi-region strategies. Watch hotspots and coordination overhead.
Architecture¶

flowchart TB
subgraph Region A
LB1[LB] --> AP1[API]
AP1 --> DB1[(DB)]
end
subgraph Region B
LB2[LB] --> AP2[API]
AP2 --> DB2[(DB)]
end
Key Concepts¶
Utilization targets
Run production with headroom; auto-scaling should react before SLO breaches—validate with load tests and chaos experiments.
Code Examples¶
1M DAU, each 20 requests/day -> ~230 RPS average
Peak ~10x average -> ~2.3k RPS; size instances accordingly
Interview Questions¶
What is the difference between strong and eventual consistency?
Strong consistency guarantees reads reflect latest writes; eventual consistency allows temporary divergence across replicas.
Name a coordination service bottleneck.
A single global counter or lock can serialize traffic—shard or approximate (e.g., HLL for counts).
Practice Problems¶
- Scale a social feed with fan-out on write vs read
- Identify bottlenecks in a monolithic checkout service during peak sales