1. What RAG Is
Retrieval-Augmented Generation explained from first principles — why it exists, the two-phase flow, and when to use it.
Retrieval-Augmented Generation explained from first principles — why it exists, the two-phase flow, and when to use it.
Splitting where meaning shifts instead of at fixed sizes — how it works, the cost trade-off, and when it's worth it.
Letting an LLM decide chunk boundaries like a human editor — the highest-quality, highest-cost chunking approach and when it pays off.
Extending RAG beyond text to images, diagrams, and scanned pages — the two main architectures and when to use each.
Beyond plain top-k — metadata filtering, MMR for diversity, small-to-big retrieval, and parent-document retrieval.
One question, many phrasings — generating query variants to widen recall and catch chunks a single wording would miss.
How to merge multiple ranked result lists into one — the RRF formula, why it drops scores and keeps ranks, and a clean implementation.
Combining dense (vector) and keyword (BM25) retrieval so you get both semantic recall and exact-match precision.
The final precision step — cross-encoder reranking — plus the full production retrieval pipeline and where to go next.
What embeddings are, how vector space encodes meaning, and the end-to-end RAG architecture components.
Part 3 of the learning path — Retrieval-Augmented Generation in 17 sections, from first principles to production retrieval, each with summary points, code, and diagrams.
Building the indexing pipeline — load, clean, chunk, embed, and store documents so they're ready to retrieve.
Implementing the retriever — embedding the query, finding the nearest chunks, and the top-k / threshold knobs that control quality.
The math that ranks chunks — cosine similarity, why it ignores magnitude, and why normalized vectors let you use a fast dot product.
Assembling the pieces into one working end-to-end RAG application — index, retrieve, prompt, generate — with a grounding guardrail.
Making RAG work in a multi-turn chat — the follow-up problem and how query rewriting (history-aware retrieval) fixes it.
Why chunking is the highest-leverage knob in RAG, the main strategies, and recommended chunk size and overlap.
Structure- and format-aware splitting — recursive separators, Markdown/code-aware splitters, and token-based sizing.