5. Cosine Similarity
Cosine similarity is how RAG measures "how close in meaning" a query vector is to a chunk vector. It's the score behind the ranking in the previous part.
The idea
It measures the angle between two vectors, not their length. Two texts pointing in the same direction in vector space are similar regardless of how "long" their vectors are — so it captures meaning, not verbosity.
similar (small angle) dissimilar (large angle)
↑ ↗ ↑
| / | ↘
|/ | ↘
└──────▶ └──────────────▶
cos θ → 1 (very similar) cos θ → 0 (unrelated)
The formula
cosine(A, B) = A · B
─────────────────
‖A‖ × ‖B‖
A · Bis the dot product.‖A‖is the vector's length (magnitude).- Result ranges from −1 (opposite) through 0 (unrelated) to 1 (identical direction).
The optimization that matters
If you normalize vectors to length 1 at embedding time, then ‖A‖ = ‖B‖ = 1,
so cosine similarity collapses to just the dot product. That's why production
systems normalize once and then do fast dot products over millions of vectors.
Code
import numpy as np
def cosine(a, b):
a, b = np.asarray(a), np.asarray(b)
return float(a @ b / (np.linalg.norm(a) * np.linalg.norm(b)))
# Pre-normalized vectors → dot product is the similarity:
def cosine_normalized(a, b):
return float(np.asarray(a) @ np.asarray(b))
Normalize embeddings once, then rank by dot product. Same result as cosine, far faster at scale.
Next: Your First RAG App →