Skip to content

Distributed Systems

Overview

Distributed systems coordinate independent nodes over unreliable networks. Core concepts include clocks, failure models, consensus, replication, and consistency trade-offs.

Why This Exists

Large-scale systems are always distributed; understanding limitations prevents impossible designs and explains real-world behavior.

How It Works

Topics: CAP as a teaching lens (not a literal binary), linearizability, serializability, eventual consistency, vector clocks, leader election, Raft/Paxos at high level, idempotency, exactly-once semantics (end-to-end), byzantine vs crash faults.

Architecture

architecture

graph TB subgraph Cluster L[Leader] F1[Follower] F2[Follower] end L --> F1 L --> F2

Key Concepts

Networks partition Assume messages can be delayed, duplicated, or dropped; design APIs and storage operations to be safe under retries.

Code Examples

PUT /items/{id} with full state is naturally idempotent
POST /charges with Idempotency-Key header dedupes on server

Interview Questions

What is the two generals problem?

Impossibility of guaranteed agreement over unreliable communication—motivates why protocols use timeouts and probabilistic guarantees.

Explain split-brain and mitigation.

Multiple nodes believe they are primary; mitigate with quorum (majority), fencing tokens, and careful failover automation.

Practice Problems

  • Compare strong consistency in a single-region DB vs cross-region replication
  • Design a distributed lock with lease expiration and fencing

Resources