Skip to main content

Author Introduction
1 · LLM Basics
- What is an LLM? (how to think about it)soon
- Tokens & Tokenizationsoon
- Next-Token Prediction & Samplingsoon
- Temperature, Top-p & Decoding Controlssoon
- Context Windows & Long-Contextsoon
- What are Reasoning Models?soon
- 2026 Model Landscape & Comparing Modelssoon
2 · Calling Models
3 · Prompting
4 · Retrieval (RAG)
- 1. What RAG Is
- 2. Embeddings & RAG Architecture
- 3. Data Ingestion Pipeline
- 4. Document Retrieval
- 5. Cosine Similarity
- 6. Your First RAG App
- 7. Conversational RAG
- 8. Chunking Strategies
- 9. Advanced Text Splitting
- 10. Semantic Chunking
- 11. Agentic Chunking
- 12. Multi-Modal RAG
- 13. Advanced Retrieval Techniques
- 14. Multi-Query RAG
- 15. Reciprocal Rank Fusion
- 16. Hybrid Search
- 17. Reranking & Next Steps
- More RAG (soon)
- Vector Databases (Pinecone, Qdrant, pgvector…)soon
- Vector Indexes — HNSW vs IVFsoon
- Query Rewriting & HyDEsoon
- Metadata Filtering & Multi-Tenant RAGsoon
- Grounding & Citationssoon
- Refusal & Unknown Handlingsoon
- RAG Failure Modes & Debuggingsoon
- Agentic RAG & Iterative Retrievalsoon
- RAG at Scale & Cache Invalidationsoon
5 · Agents
6 · Orchestration
7 · Evaluation
8 · Tuning Decisions
9 · Production & Ops

11. Agentic Chunking

Agentic chunking hands the splitting decision to an LLM, which reads the document and groups it into self-contained, meaningful units the way a human editor would — often rewriting each unit into a clear, standalone "proposition."

How it works

 document ─▶ LLM ─▶ "break this into self-contained idea units"
                      │
                      ▼
        ["unit 1: ...", "unit 2: ...", ...]  each coherent + standalone

Unlike semantic chunking (which only measures sentence similarity), the LLM can reorganize, merge, and reword so each chunk stands on its own without surrounding context.

Code — sketch

def agentic_chunk(text):
    prompt = (
        "Split the document into self-contained chunks. Each chunk should cover "
        "one idea and make sense on its own. Return a JSON list of strings.\n\n"
        f"Document:\n{text}"
    )
    return json.loads(llm(prompt))     # list of standalone chunks

The trade-off

	Agentic	Semantic	Recursive
Quality	highest	high	good
Cost	highest (LLM per doc)	medium	lowest
Speed	slowest	medium	fastest

When it's worth it

Reserve agentic chunking for messy or mixed-format documents where retrieval quality is critical and the corpus is small enough that an LLM pass per document is affordable. For large or uniform corpora, recursive (or semantic) wins on cost.

Quality ladder: recursive → semantic → agentic. Climb only as far as your eval set and budget justify.

Next: Multi-Modal RAG →

How it works
Code — sketch
The trade-off
When it's worth it