Skip to main content

7. Conversational RAG

Single-shot RAG answers one isolated question. In a chat, questions depend on what came before — and that breaks naive retrieval.

The follow-up problem

User: "What does the refund policy say?"
Bot: "...30 days..."
User: "And for digital items?" ← embed THIS alone → retrieves nothing useful

"And for digital items?" has no standalone meaning. Embedding it directly retrieves garbage because the subject ("refund policy") lives in the previous turn.

The fix: history-aware query rewriting

Before retrieving, use the LLM to rewrite the follow-up into a standalone question using the chat history, then retrieve with that.

history + follow-up ─▶ LLM rewrites ─▶ "What is the refund policy for digital items?"

retrieve ─▶ answer

Code

def condense(history, follow_up):
prompt = (
"Given the conversation, rewrite the follow-up as a standalone question.\n\n"
f"Conversation:\n{history}\n\nFollow-up: {follow_up}\n\nStandalone question:"
)
return llm(prompt).strip()

def chat_answer(history, follow_up):
standalone = condense(history, follow_up) # history-aware
top, _ = retrieve(standalone) # retrieve with the rewritten query
context = "\n\n".join(c["text"] for c in top)
return llm(f"Context:\n{context}\n\nQuestion: {standalone}")

Practical notes

  • Only condense when needed — for a self-contained first question, skip the rewrite to save a call.
  • Cap history length — pass the last few turns, not the whole transcript, to stay inside the context window.

Next: Chunking Strategies →