Skip to main content

Author Introduction
1 · LLM Basics
- What is an LLM? (how to think about it)soon
- Tokens & Tokenizationsoon
- Next-Token Prediction & Samplingsoon
- Temperature, Top-p & Decoding Controlssoon
- Context Windows & Long-Contextsoon
- What are Reasoning Models?soon
- 2026 Model Landscape & Comparing Modelssoon
2 · Calling Models
3 · Prompting
4 · Retrieval (RAG)
- 1. What RAG Is
- 2. Embeddings & RAG Architecture
- 3. Data Ingestion Pipeline
- 4. Document Retrieval
- 5. Cosine Similarity
- 6. Your First RAG App
- 7. Conversational RAG
- 8. Chunking Strategies
- 9. Advanced Text Splitting
- 10. Semantic Chunking
- 11. Agentic Chunking
- 12. Multi-Modal RAG
- 13. Advanced Retrieval Techniques
- 14. Multi-Query RAG
- 15. Reciprocal Rank Fusion
- 16. Hybrid Search
- 17. Reranking & Next Steps
- More RAG (soon)
- Vector Databases (Pinecone, Qdrant, pgvector…)soon
- Vector Indexes — HNSW vs IVFsoon
- Query Rewriting & HyDEsoon
- Metadata Filtering & Multi-Tenant RAGsoon
- Grounding & Citationssoon
- Refusal & Unknown Handlingsoon
- RAG Failure Modes & Debuggingsoon
- Agentic RAG & Iterative Retrievalsoon
- RAG at Scale & Cache Invalidationsoon
5 · Agents
6 · Orchestration
7 · Evaluation
8 · Tuning Decisions
9 · Production & Ops

10. Semantic Chunking

Semantic chunking places boundaries where the topic changes, not at fixed character counts — so each chunk is one coherent idea.

How it works

split doc into sentences
embed every sentence
walk neighbours; measure similarity between consecutive sentences
start a NEW chunk wherever similarity drops below a threshold (topic shift)

 sent: A  A  A | B  B | C  C  C
 sim:  hi hi  ↓lo  hi ↓lo  hi hi
              break    break
 → chunks: [A A A] [B B] [C C C]

Code

# pip install langchain-experimental
from langchain_experimental.text_splitter import SemanticChunker
from langchain_huggingface import HuggingFaceEmbeddings

splitter = SemanticChunker(
    embeddings=HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2"),
    breakpoint_threshold_type="percentile",   # break at biggest similarity drops
)
chunks = splitter.create_documents([text])

The trade-off

It produces noticeably more coherent chunks, but it embeds every sentence just to decide the splits — so for a long document you generate hundreds of extra embeddings at indexing time. That's more compute/cost and slower ingestion.

Reach for semantic chunking when recursive plateaus on your eval set and the indexing-time cost is acceptable. Otherwise recursive is the better default.

Next: Agentic Chunking →

How it works
Code
The trade-off