14. Multi-Query RAG

A single phrasing of a question can miss relevant chunks that were written with different words. Multi-query retrieval generates several rephrasings, retrieves for each, and merges the results.

The idea

                 ┌─▶ "How do I reset my password?"
 user question ──┼─▶ "steps to recover a forgotten login"   ─▶ retrieve each
                 └─▶ "change account credentials"               │
                                                                ▼
                                                    union + dedup ─▶ context

Each variant surfaces chunks the others miss, so overall recall goes up.

Code

def multi_query(question, retrieve, n=3):
    prompt = (f"Generate {n} alternative phrasings of this question, one per line:\n{question}")
    variants = [question] + llm(prompt).splitlines()

    hits = {}
    for q in variants:
        for score, chunk in retrieve(q):
            # keep the best score seen for each unique chunk
            hits[chunk["id"]] = max(score, hits.get(chunk["id"], 0))
    return sorted(hits.values(), reverse=True), hits

In LangChain this is built in:

from langchain.retrievers.multi_query import MultiQueryRetriever
retriever = MultiQueryRetriever.from_llm(retriever=base_retriever, llm=llm)

Trade-off

More variants = better recall but more retrieval calls (and an extra LLM call to generate them). Three or four variants is usually plenty.

Multi-query trades a little extra compute for higher recall — great when users phrase things very differently from your documents.

Next: Reciprocal Rank Fusion →

The idea​

Code​

Trade-off​

The idea

Code

Trade-off