Skip to content

API Scalability

Overview

Scaling APIs means increasing sustainable throughput under latency SLOs. Levers include horizontal scaling, efficient data access, caching, async processing, and backpressure.

Why This Exists

Traffic grows; incidents happen during peaks. Scalability work connects application design to infrastructure limits and cost.

How It Works

Techniques: stateless app servers, connection pooling, read scaling, partitioning, rate limiting, bulkheads, auto-scaling policies, load testing, and capacity planning. Pair with System design — scalability.

Architecture

architecture

flowchart TB LB[Load balancer] --> A1[App] LB --> A2[App] A1 --> C[(Cache)] A2 --> C A1 --> DB[(DB primary)] DB --> R[(Replicas)]

Key Concepts

Latency percentiles matter Optimizing p99 often matters more than mean latency; trace tail events caused by GC, cold caches, or slow queries.

Code Examples

srv := &http.Server{
  Addr:              ":8080",
  ReadHeaderTimeout: 5 * time.Second,
  WriteTimeout:      10 * time.Second,
  IdleTimeout:       60 * time.Second,
}

Interview Questions

What is backpressure?

A mechanism to signal producers to slow down when consumers cannot keep up—essential to avoid unbounded queues and OOMs.

How does connection pooling help?

Reuses expensive TCP/TLS and DB handshakes; unbounded pools can exhaust DB resources—tune max connections per instance.

Practice Problems

  • Propose scaling plan for 10× traffic on a read-heavy catalog API
  • Identify bottlenecks in a flame graph showing DB time dominating requests

Resources