10. Semantic Chunking
Semantic chunking places boundaries where the topic changes, not at fixed character counts — so each chunk is one coherent idea.
How it works
1. split doc into sentences
2. embed every sentence
3. walk neighbours; measure similarity between consecutive sentences
4. start a NEW chunk wherever similarity drops below a threshold (topic shift)
sent: A A A | B B | C C C
sim: hi hi ↓lo hi ↓lo hi hi
break break
→ chunks: [A A A] [B B] [C C C]
Code
# pip install langchain-experimental
from langchain_experimental.text_splitter import SemanticChunker
from langchain_huggingface import HuggingFaceEmbeddings
splitter = SemanticChunker(
embeddings=HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2"),
breakpoint_threshold_type="percentile", # break at biggest similarity drops
)
chunks = splitter.create_documents([text])
The trade-off
It produces noticeably more coherent chunks, but it embeds every sentence just to decide the splits — so for a long document you generate hundreds of extra embeddings at indexing time. That's more compute/cost and slower ingestion.
Reach for semantic chunking when recursive plateaus on your eval set and the indexing-time cost is acceptable. Otherwise recursive is the better default.
Next: Agentic Chunking →