8. Chunking Strategies
Chunking decides what a "retrievable unit" is. Because retrieval works on whole chunks, a fact split across two chunks — or buried in a giant one — becomes hard to surface. This is the single highest-leverage knob in RAG.
The strategies at a glance
Fixed |■■■■|■■■■|■■■■| every N chars — simple, cuts sentences
Recursive |■■■ |■■■■■|■■ | split on ¶ → line → sentence — structure-aware ✅
Semantic |■■■■■|■■|■■■■■| split where meaning shifts — coherent, costly
Agentic | LLM decides | human-like splits — best quality, slowest
| Strategy | Best for |
|---|---|
| Fixed | quick prototypes, uniform text |
| Recursive | most production RAG (start here) |
| Semantic | when recursive plateaus and budget allows |
| Agentic | messy/mixed docs where quality is critical |
Size and overlap
A solid default: ~400–512 tokens per chunk with 10–20% overlap. Overlap means consecutive chunks share a little text so a fact sitting on a boundary still appears whole in at least one chunk.
chunk A: [........ overlap]
chunk B: [overlap ........]
└─ shared so boundary facts survive
Code — the default
from langchain_text_splitters import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=512,
chunk_overlap=64, # ~12%
separators=["\n\n", "\n", ". ", " ", ""], # coarse → fine
)
chunks = splitter.create_documents([text])
Chunk by meaning, keep a small overlap, attach metadata. Tune size against a real eval set — not by eyeballing.
The next three parts go deeper into splitting techniques.