Skip to main content

13. Advanced Retrieval Techniques

Plain top-k cosine search is the baseline. These techniques fix its common failure modes: redundant results, missing context, and irrelevant matches.

Metadata filtering

Restrict the search space before scoring — by source, date, type, or section. Faster and more precise.

retriever = store.as_retriever(search_kwargs={
"k": 4,
"filter": {"source": "handbook.pdf", "year": 2026},
})

MMR — diversity, not duplicates

Top-k often returns near-duplicate chunks. Maximal Marginal Relevance balances relevance with novelty, so the context covers more ground.

top-k: [A] [A'] [A''] [B] ← three near-copies of A
MMR: [A] [B] [C] [D] ← relevant AND diverse
retriever = store.as_retriever(search_type="mmr",
search_kwargs={"k": 4, "fetch_k": 20})

Small-to-big (parent-document) retrieval

Embed small chunks for precise matching, but return the larger parent passage to the LLM so it has full context.

match on: small precise chunk
return to LLM: its bigger parent section

This avoids the "matched the right line but lost the surrounding explanation" problem.

How they combine

query ─▶ metadata filter ─▶ vector search (MMR) ─▶ expand to parents ─▶ context

Match small for precision, return big for context, filter by metadata for relevance, and use MMR so the context isn't four copies of the same paragraph.

Next: Multi-Query RAG →