6. Your First RAG App

Now we connect indexing, retrieval, and generation into one runnable app.

End-to-end shape

 docs ─▶ [index once] ─▶ vector store
 question ─▶ retrieve top-k ─▶ build prompt ─▶ LLM ─▶ answer (+ citations)

Code — minimal but complete

from sentence_transformers import SentenceTransformer
import numpy as np

emb = SentenceTransformer("all-MiniLM-L6-v2")

# --- index once ---
chunks = chunk(load_documents())                 # parts 3 & 8
vectors = emb.encode([c["text"] for c in chunks], normalize_embeddings=True)
for c, v in zip(chunks, vectors):
    c["vector"] = v

def retrieve(question, k=4, min_score=0.25):
    q = emb.encode(question, normalize_embeddings=True)
    scored = sorted(((float(q @ c["vector"]), c) for c in chunks),
                    key=lambda x: x[0], reverse=True)
    top = [c for s, c in scored[:k] if s >= min_score]
    return top, (scored[0][0] if scored else 0)

def answer(question):
    top, best = retrieve(question)
    if not top:                                   # grounding guardrail
        return "I don't have information on that in my sources."
    context = "\n\n".join(c["text"] for c in top)
    prompt = (
        "Answer using ONLY the context. If it's not there, say you don't know.\n\n"
        f"Context:\n{context}\n\nQuestion: {question}"
    )
    return llm(prompt)                            # your LLM call

What makes it a product, not a demo

Grounding guardrail — the min_score check + "say you don't know" prompt stops confident hallucinations when retrieval finds nothing.
Citations — keep each chunk's source metadata and show it with the answer.
Evaluation — keep a small set of real questions with known-good answers and re-run it after every change (see Reranking & Next Steps).

Next: Conversational RAG →

End-to-end shape​

Code — minimal but complete​

What makes it a product, not a demo​

End-to-end shape

Code — minimal but complete

What makes it a product, not a demo