13. Advanced Retrieval Techniques
Plain top-k cosine search is the baseline. These techniques fix its common failure modes: redundant results, missing context, and irrelevant matches.
Metadata filtering
Restrict the search space before scoring — by source, date, type, or section. Faster and more precise.
retriever = store.as_retriever(search_kwargs={
"k": 4,
"filter": {"source": "handbook.pdf", "year": 2026},
})
MMR — diversity, not duplicates
Top-k often returns near-duplicate chunks. Maximal Marginal Relevance balances relevance with novelty, so the context covers more ground.
top-k: [A] [A'] [A''] [B] ← three near-copies of A
MMR: [A] [B] [C] [D] ← relevant AND diverse
retriever = store.as_retriever(search_type="mmr",
search_kwargs={"k": 4, "fetch_k": 20})
Small-to-big (parent-document) retrieval
Embed small chunks for precise matching, but return the larger parent passage to the LLM so it has full context.
match on: small precise chunk
return to LLM: its bigger parent section
This avoids the "matched the right line but lost the surrounding explanation" problem.
How they combine
query ─▶ metadata filter ─▶ vector search (MMR) ─▶ expand to parents ─▶ context
Match small for precision, return big for context, filter by metadata for relevance, and use MMR so the context isn't four copies of the same paragraph.
Next: Multi-Query RAG →