demo · strategies + composition

Three ways to find the right chunk

Dense embeddings catch paraphrase. BM25 catches rare keywords. Hybrid (Reciprocal Rank Fusion) catches both. Pick a query and see the same 12 chunks ranked three different ways. Switch to Composition to see the full RAG tradeoff — top-k drives a wave through scores, retrieval, recall, and the production bill.

The three strategies

  • Dense — embed the query, embed each chunk, rank by cosine similarity. Captures paraphrase ("how does attention work" matches a chunk titled "Self-attention" even though the words don't overlap).
  • Sparse (BM25) — classic IR scoring over token overlap. Captures rare keywords (a query for "LoRA" nails the LoRA chunk because "LoRA" is rare in the corpus). Misses paraphrase entirely.
  • Hybrid (RRF) — Reciprocal Rank Fusion. For each chunk, sum 1/(60 + rank) across both strategies. The constant 60 is the standard pick from Cormack 2009 ; tune it if you like. RRF gives a robust merge without having to calibrate the very different score scales of dense and sparse.

What to look for

  1. Paraphrase wins. Try "How does self-attention work in transformers?" — dense correctly ranks the "Self-attention" chunk first; BM25 gets distracted by chunks that mention "transformer" more often.
  2. Rare keyword wins. Try "What's the difference between LoRA and full fine-tuning?" — BM25 absolutely nails the LoRA chunk because the keyword is a strong signal. Dense gets it too, but with less margin.
  3. Compare the rank table. When dense and sparse ranks disagree on a chunk, hybrid does the actual merging. When they agree, hybrid just confirms.

What this leaves out

A real RAG stack adds: a reranker (a small cross-encoder that reads query + chunk together; dramatically better top-1 quality), filtering by metadata (date, source, permission), query rewriting (HyDE, FLARE), and evaluation on a golden set with retrieval@k and faithfulness metrics. This demo shows the core retrieval step in isolation — the place where most teams get the most lift for the least effort.

Anchored to 09-rag/rag-fundamentals and 09-rag/hybrid-search-and-reranking .