Skip to main content

6. Your First RAG App

Now we connect indexing, retrieval, and generation into one runnable app.

End-to-end shape

docs ─▶ [index once] ─▶ vector store
question ─▶ retrieve top-k ─▶ build prompt ─▶ LLM ─▶ answer (+ citations)

Code — minimal but complete

from sentence_transformers import SentenceTransformer
import numpy as np

emb = SentenceTransformer("all-MiniLM-L6-v2")

# --- index once ---
chunks = chunk(load_documents()) # parts 3 & 8
vectors = emb.encode([c["text"] for c in chunks], normalize_embeddings=True)
for c, v in zip(chunks, vectors):
c["vector"] = v

def retrieve(question, k=4, min_score=0.25):
q = emb.encode(question, normalize_embeddings=True)
scored = sorted(((float(q @ c["vector"]), c) for c in chunks),
key=lambda x: x[0], reverse=True)
top = [c for s, c in scored[:k] if s >= min_score]
return top, (scored[0][0] if scored else 0)

def answer(question):
top, best = retrieve(question)
if not top: # grounding guardrail
return "I don't have information on that in my sources."
context = "\n\n".join(c["text"] for c in top)
prompt = (
"Answer using ONLY the context. If it's not there, say you don't know.\n\n"
f"Context:\n{context}\n\nQuestion: {question}"
)
return llm(prompt) # your LLM call

What makes it a product, not a demo

  • Grounding guardrail — the min_score check + "say you don't know" prompt stops confident hallucinations when retrieval finds nothing.
  • Citations — keep each chunk's source metadata and show it with the answer.
  • Evaluation — keep a small set of real questions with known-good answers and re-run it after every change (see Reranking & Next Steps).

Next: Conversational RAG →