Skip to main content

Author Introduction
1 · LLM Basics
- What is an LLM? (how to think about it)soon
- Tokens & Tokenizationsoon
- Next-Token Prediction & Samplingsoon
- Temperature, Top-p & Decoding Controlssoon
- Context Windows & Long-Contextsoon
- What are Reasoning Models?soon
- 2026 Model Landscape & Comparing Modelssoon
2 · Calling Models
3 · Prompting
4 · Retrieval (RAG)
- 1. What RAG Is
- 2. Embeddings & RAG Architecture
- 3. Data Ingestion Pipeline
- 4. Document Retrieval
- 5. Cosine Similarity
- 6. Your First RAG App
- 7. Conversational RAG
- 8. Chunking Strategies
- 9. Advanced Text Splitting
- 10. Semantic Chunking
- 11. Agentic Chunking
- 12. Multi-Modal RAG
- 13. Advanced Retrieval Techniques
- 14. Multi-Query RAG
- 15. Reciprocal Rank Fusion
- 16. Hybrid Search
- 17. Reranking & Next Steps
- More RAG (soon)
- Vector Databases (Pinecone, Qdrant, pgvector…)soon
- Vector Indexes — HNSW vs IVFsoon
- Query Rewriting & HyDEsoon
- Metadata Filtering & Multi-Tenant RAGsoon
- Grounding & Citationssoon
- Refusal & Unknown Handlingsoon
- RAG Failure Modes & Debuggingsoon
- Agentic RAG & Iterative Retrievalsoon
- RAG at Scale & Cache Invalidationsoon
5 · Agents
6 · Orchestration
7 · Evaluation
8 · Tuning Decisions
9 · Production & Ops

16. Hybrid Search

Hybrid search runs dense (vector) and keyword (BM25) retrieval together and fuses the results — covering each method's blind spot.

Why both

	Strength	Blind spot
Dense (vectors)	meaning, paraphrases	exact terms: codes, names, acronyms
BM25 (keywords)	exact tokens	synonyms, rephrasing

A query like ERR_4012 on checkout needs the exact code (BM25) and the concept "checkout error" (dense). Either alone underperforms.

The pipeline

                ┌─▶ DENSE  ─▶ ranked list A ─┐
 query ─────────┤                             ├─▶ RRF ─▶ fused results
                └─▶ BM25   ─▶ ranked list B ─┘

The fusion step is Reciprocal Rank Fusion, which merges the two incompatible score scales by rank.

Code

from rank_bm25 import BM25Okapi
import numpy as np

# BM25 over tokenized chunks
bm25 = BM25Okapi([c["text"].split() for c in chunks])

def hybrid(query, query_vec, k=10):
    # dense ranking
    dense = sorted(chunks, key=lambda c: float(query_vec @ c["vector"]), reverse=True)
    dense_ids = [c["id"] for c in dense[:k]]
    # keyword ranking
    bm = np.argsort(bm25.get_scores(query.split()))[::-1][:k]
    bm25_ids = [chunks[i]["id"] for i in bm]
    # fuse
    return reciprocal_rank_fusion([dense_ids, bm25_ids])

Practical notes

Weighting — some stores let you weight dense vs keyword; RRF's k also tunes influence. Start balanced.
Almost always a win — for real apps with names/codes/jargon, hybrid beats pure vector search consistently.

Next: Reranking & Next Steps →

Why both
The pipeline
Code
Practical notes