18 docs tagged with "RAG"

1. What RAG Is

Retrieval-Augmented Generation explained from first principles — why it exists, the two-phase flow, and when to use it.

10. Semantic Chunking

Splitting where meaning shifts instead of at fixed sizes — how it works, the cost trade-off, and when it's worth it.

11. Agentic Chunking

Letting an LLM decide chunk boundaries like a human editor — the highest-quality, highest-cost chunking approach and when it pays off.

12. Multi-Modal RAG

Extending RAG beyond text to images, diagrams, and scanned pages — the two main architectures and when to use each.

13. Advanced Retrieval Techniques

Beyond plain top-k — metadata filtering, MMR for diversity, small-to-big retrieval, and parent-document retrieval.

14. Multi-Query RAG

One question, many phrasings — generating query variants to widen recall and catch chunks a single wording would miss.

15. Reciprocal Rank Fusion

How to merge multiple ranked result lists into one — the RRF formula, why it drops scores and keeps ranks, and a clean implementation.

16. Hybrid Search

Combining dense (vector) and keyword (BM25) retrieval so you get both semantic recall and exact-match precision.

17. Reranking & Next Steps

The final precision step — cross-encoder reranking — plus the full production retrieval pipeline and where to go next.

2. Embeddings & RAG Architecture

What embeddings are, how vector space encodes meaning, and the end-to-end RAG architecture components.

3 · Retrieval (RAG) — Overview

Part 3 of the learning path — Retrieval-Augmented Generation in 17 sections, from first principles to production retrieval, each with summary points, code, and diagrams.

3. Data Ingestion Pipeline

Building the indexing pipeline — load, clean, chunk, embed, and store documents so they're ready to retrieve.

4. Document Retrieval

Implementing the retriever — embedding the query, finding the nearest chunks, and the top-k / threshold knobs that control quality.

5. Cosine Similarity

The math that ranks chunks — cosine similarity, why it ignores magnitude, and why normalized vectors let you use a fast dot product.

6. Your First RAG App

Assembling the pieces into one working end-to-end RAG application — index, retrieve, prompt, generate — with a grounding guardrail.

7. Conversational RAG

Making RAG work in a multi-turn chat — the follow-up problem and how query rewriting (history-aware retrieval) fixes it.

8. Chunking Strategies

Why chunking is the highest-leverage knob in RAG, the main strategies, and recommended chunk size and overlap.

9. Advanced Text Splitting

Structure- and format-aware splitting — recursive separators, Markdown/code-aware splitters, and token-based sizing.