Skip to main content

14. Multi-Query RAG

A single phrasing of a question can miss relevant chunks that were written with different words. Multi-query retrieval generates several rephrasings, retrieves for each, and merges the results.

The idea

┌─▶ "How do I reset my password?"
user question ──┼─▶ "steps to recover a forgotten login" ─▶ retrieve each
└─▶ "change account credentials" │

union + dedup ─▶ context

Each variant surfaces chunks the others miss, so overall recall goes up.

Code

def multi_query(question, retrieve, n=3):
prompt = (f"Generate {n} alternative phrasings of this question, one per line:\n{question}")
variants = [question] + llm(prompt).splitlines()

hits = {}
for q in variants:
for score, chunk in retrieve(q):
# keep the best score seen for each unique chunk
hits[chunk["id"]] = max(score, hits.get(chunk["id"], 0))
return sorted(hits.values(), reverse=True), hits

In LangChain this is built in:

from langchain.retrievers.multi_query import MultiQueryRetriever
retriever = MultiQueryRetriever.from_llm(retriever=base_retriever, llm=llm)

Trade-off

More variants = better recall but more retrieval calls (and an extra LLM call to generate them). Three or four variants is usually plenty.

Multi-query trades a little extra compute for higher recall — great when users phrase things very differently from your documents.

Next: Reciprocal Rank Fusion →