Embeddings¶

Overview¶

Embeddings map text (or images) into dense vectors so semantic similarity approximates distance in vector space. They power search, clustering, classification, and retrieval for RAG.

Why This Exists¶

Keyword search misses paraphrases; embeddings capture meaning well enough for nearest-neighbor retrieval at scale.

How It Works¶

Training approaches include contrastive learning and dual encoders. Operational concerns: normalization, dimensionality, domain mismatch, staleness as models update, and batching for throughput.

Architecture¶

architecture

flowchart LR Text[Text] --> Enc[Encoder] Enc --> Vec[Vector] Vec --> ANN[Approximate NN index]

Key Concepts¶

Metric choice Cosine similarity on L2-normalized vectors is common; verify metric matches your index library defaults.

Code Examples¶

Python — cosine similarity

import math

def cosine(a: list[float], b: list[float]) -> float:
    dot = sum(x * y for x, y in zip(a, b))
    na = math.sqrt(sum(x * x for x in a))
    nb = math.sqrt(sum(y * y for y in b))
    return dot / (na * nb) if na and nb else 0.0

Interview Questions¶

Why re-embed documents when upgrading models?

Vector spaces are not comparable across unrelated models—plan migrations and dual-write/dual-query strategies.

What is Matryoshka embedding?

Training embeddings where prefixes of the vector remain informative—enables flexible storage tiers.

Practice Problems¶

Evaluate embedding quality with a small labeled relevance set
Compare bi-encoder retrieval vs cross-encoder reranking for rerank latency