Skip to main content

Author Introduction
1 · LLM Basics
- What is an LLM? (how to think about it)soon
- Tokens & Tokenizationsoon
- Next-Token Prediction & Samplingsoon
- Temperature, Top-p & Decoding Controlssoon
- Context Windows & Long-Contextsoon
- What are Reasoning Models?soon
- 2026 Model Landscape & Comparing Modelssoon
2 · Calling Models
3 · Prompting
4 · Retrieval (RAG)
- 1. What RAG Is
- 2. Embeddings & RAG Architecture
- 3. Data Ingestion Pipeline
- 4. Document Retrieval
- 5. Cosine Similarity
- 6. Your First RAG App
- 7. Conversational RAG
- 8. Chunking Strategies
- 9. Advanced Text Splitting
- 10. Semantic Chunking
- 11. Agentic Chunking
- 12. Multi-Modal RAG
- 13. Advanced Retrieval Techniques
- 14. Multi-Query RAG
- 15. Reciprocal Rank Fusion
- 16. Hybrid Search
- 17. Reranking & Next Steps
- More RAG (soon)
- Vector Databases (Pinecone, Qdrant, pgvector…)soon
- Vector Indexes — HNSW vs IVFsoon
- Query Rewriting & HyDEsoon
- Metadata Filtering & Multi-Tenant RAGsoon
- Grounding & Citationssoon
- Refusal & Unknown Handlingsoon
- RAG Failure Modes & Debuggingsoon
- Agentic RAG & Iterative Retrievalsoon
- RAG at Scale & Cache Invalidationsoon
5 · Agents
6 · Orchestration
7 · Evaluation
8 · Tuning Decisions
9 · Production & Ops

5. Cosine Similarity

Cosine similarity is how RAG measures "how close in meaning" a query vector is to a chunk vector. It's the score behind the ranking in the previous part.

The idea

It measures the angle between two vectors, not their length. Two texts pointing in the same direction in vector space are similar regardless of how "long" their vectors are — so it captures meaning, not verbosity.

   similar (small angle)            dissimilar (large angle)
        ↑  ↗                              ↑
        | /                               |        ↘
        |/                                |          ↘
        └──────▶                          └──────────────▶
   cos θ → 1 (very similar)          cos θ → 0 (unrelated)

The formula

cosine(A, B) =        A · B
                ─────────────────
                  ‖A‖ × ‖B‖

A · B is the dot product.
‖A‖ is the vector's length (magnitude).
Result ranges from −1 (opposite) through 0 (unrelated) to 1 (identical direction).

The optimization that matters

If you normalize vectors to length 1 at embedding time, then ‖A‖ = ‖B‖ = 1, so cosine similarity collapses to just the dot product. That's why production systems normalize once and then do fast dot products over millions of vectors.

Code

import numpy as np

def cosine(a, b):
    a, b = np.asarray(a), np.asarray(b)
    return float(a @ b / (np.linalg.norm(a) * np.linalg.norm(b)))

# Pre-normalized vectors → dot product is the similarity:
def cosine_normalized(a, b):
    return float(np.asarray(a) @ np.asarray(b))

Normalize embeddings once, then rank by dot product. Same result as cosine, far faster at scale.

Next: Your First RAG App →

The idea
The formula
The optimization that matters
Code