13. Advanced Retrieval Techniques

Plain top-k cosine search is the baseline. These techniques fix its common failure modes: redundant results, missing context, and irrelevant matches.

Metadata filtering

Restrict the search space before scoring — by source, date, type, or section. Faster and more precise.

retriever = store.as_retriever(search_kwargs={
    "k": 4,
    "filter": {"source": "handbook.pdf", "year": 2026},
})

MMR — diversity, not duplicates

Top-k often returns near-duplicate chunks. Maximal Marginal Relevance balances relevance with novelty, so the context covers more ground.

 top-k:  [A] [A'] [A''] [B]     ← three near-copies of A
 MMR:    [A] [B]  [C]   [D]     ← relevant AND diverse

retriever = store.as_retriever(search_type="mmr",
                               search_kwargs={"k": 4, "fetch_k": 20})

Small-to-big (parent-document) retrieval

Embed small chunks for precise matching, but return the larger parent passage to the LLM so it has full context.

 match on:   small precise chunk
 return to LLM: its bigger parent section

This avoids the "matched the right line but lost the surrounding explanation" problem.

How they combine

 query ─▶ metadata filter ─▶ vector search (MMR) ─▶ expand to parents ─▶ context

Match small for precision, return big for context, filter by metadata for relevance, and use MMR so the context isn't four copies of the same paragraph.

Next: Multi-Query RAG →

Metadata filtering​

MMR — diversity, not duplicates​

Small-to-big (parent-document) retrieval​

How they combine​

Metadata filtering

MMR — diversity, not duplicates

Small-to-big (parent-document) retrieval

How they combine