14. Multi-Query RAG
A single phrasing of a question can miss relevant chunks that were written with different words. Multi-query retrieval generates several rephrasings, retrieves for each, and merges the results.
The idea
┌─▶ "How do I reset my password?"
user question ──┼─▶ "steps to recover a forgotten login" ─▶ retrieve each
└─▶ "change account credentials" │
▼
union + dedup ─▶ context
Each variant surfaces chunks the others miss, so overall recall goes up.
Code
def multi_query(question, retrieve, n=3):
prompt = (f"Generate {n} alternative phrasings of this question, one per line:\n{question}")
variants = [question] + llm(prompt).splitlines()
hits = {}
for q in variants:
for score, chunk in retrieve(q):
# keep the best score seen for each unique chunk
hits[chunk["id"]] = max(score, hits.get(chunk["id"], 0))
return sorted(hits.values(), reverse=True), hits
In LangChain this is built in:
from langchain.retrievers.multi_query import MultiQueryRetriever
retriever = MultiQueryRetriever.from_llm(retriever=base_retriever, llm=llm)
Trade-off
More variants = better recall but more retrieval calls (and an extra LLM call to generate them). Three or four variants is usually plenty.
Multi-query trades a little extra compute for higher recall — great when users phrase things very differently from your documents.
Next: Reciprocal Rank Fusion →