Skip to main content

Author Introduction
1 · LLM Basics
- What is an LLM? (how to think about it)soon
- Tokens & Tokenizationsoon
- Next-Token Prediction & Samplingsoon
- Temperature, Top-p & Decoding Controlssoon
- Context Windows & Long-Contextsoon
- What are Reasoning Models?soon
- 2026 Model Landscape & Comparing Modelssoon
2 · Calling Models
3 · Prompting
4 · Retrieval (RAG)
- 1. What RAG Is
- 2. Embeddings & RAG Architecture
- 3. Data Ingestion Pipeline
- 4. Document Retrieval
- 5. Cosine Similarity
- 6. Your First RAG App
- 7. Conversational RAG
- 8. Chunking Strategies
- 9. Advanced Text Splitting
- 10. Semantic Chunking
- 11. Agentic Chunking
- 12. Multi-Modal RAG
- 13. Advanced Retrieval Techniques
- 14. Multi-Query RAG
- 15. Reciprocal Rank Fusion
- 16. Hybrid Search
- 17. Reranking & Next Steps
- More RAG (soon)
- Vector Databases (Pinecone, Qdrant, pgvector…)soon
- Vector Indexes — HNSW vs IVFsoon
- Query Rewriting & HyDEsoon
- Metadata Filtering & Multi-Tenant RAGsoon
- Grounding & Citationssoon
- Refusal & Unknown Handlingsoon
- RAG Failure Modes & Debuggingsoon
- Agentic RAG & Iterative Retrievalsoon
- RAG at Scale & Cache Invalidationsoon
5 · Agents
6 · Orchestration
7 · Evaluation
8 · Tuning Decisions
9 · Production & Ops

8. Chunking Strategies

Chunking decides what a "retrievable unit" is. Because retrieval works on whole chunks, a fact split across two chunks — or buried in a giant one — becomes hard to surface. This is the single highest-leverage knob in RAG.

The strategies at a glance

 Fixed      |■■■■|■■■■|■■■■|   every N chars — simple, cuts sentences
 Recursive  |■■■ |■■■■■|■■ |   split on ¶ → line → sentence — structure-aware ✅
 Semantic   |■■■■■|■■|■■■■■|   split where meaning shifts — coherent, costly
 Agentic    | LLM decides  |   human-like splits — best quality, slowest

Strategy	Best for
Fixed	quick prototypes, uniform text
Recursive	most production RAG (start here)
Semantic	when recursive plateaus and budget allows
Agentic	messy/mixed docs where quality is critical

Size and overlap

A solid default: ~400–512 tokens per chunk with 10–20% overlap. Overlap means consecutive chunks share a little text so a fact sitting on a boundary still appears whole in at least one chunk.

 chunk A: [........ overlap]
 chunk B:          [overlap ........]
                    └─ shared so boundary facts survive

Code — the default

from langchain_text_splitters import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=512,
    chunk_overlap=64,                              # ~12%
    separators=["\n\n", "\n", ". ", " ", ""],      # coarse → fine
)
chunks = splitter.create_documents([text])

Chunk by meaning, keep a small overlap, attach metadata. Tune size against a real eval set — not by eyeballing.

The next three parts go deeper into splitting techniques.

Next: Advanced Text Splitting →

The strategies at a glance
Size and overlap
Code — the default