demo
Cross-encoders fix what bi-encoders break
Initial retrieval gives you the top-100 candidates. A cross-encoder reranker reads each (query, chunk) pair together and re-orders them. The single biggest quality lever in production RAG.
Bi-encoder vs cross-encoder — what's actually different
# bi-encoder (initial retrieval, FAST):
emb_q = encoder(query) # encoded ONCE per query
emb_d = encoder(doc) # encoded ONCE per doc, cached at index time
score = cos_sim(emb_q, emb_d) # cheap dot product per candidate
# cross-encoder (rerank, SLOW but ACCURATE):
score = encoder(query, doc) # query and doc go in together
# the encoder cross-attends across them
# → much richer score, but a fresh forward
# pass per pair The cross-encoder catches things the bi-encoder misses because it can attend across the query and the chunk together. Common example: "Did Apple sue Samsung?" — bi-encoder might rank a paragraph about apples and samsung together highly; cross-encoder catches that the chunk doesn't actually answer the legal question.
Try this — predict before you click
- Pick the first scenario. Look at the bi-encoder ranking vs the cross-encoder ranking. Predict: the rankings differ on at least 1–2 chunks. The "lift" stat shows the gain.
- Pick a scenario where dense retrieval already had the right answer at #1. Predict: cross-encoder rerank doesn't change the top result — but it does shuffle middle ranks (5–10). Reranking is most useful when the bi-encoder is unsure between several plausible candidates.
- Look at scenarios where the dense top-1 is wrong. Predict: these are the cases where cross-encoder rerank pays off most. Production RAG stacks pull the top-50 from dense and rerank to top-5 — the rerank's quality lift on those middle ranks is the difference between "answer is in context" and "answer is missing".
- The trade-off: cross-encoder is ~100× slower per pair than dense cosine. Predict: production systems use dense to get top-50 cheaply, then rerank only those 50 with the cross-encoder. Total latency = single dense pass + 50 cross-encoder passes ≪ 50K cross-encoder passes over the full index.
Anchored to 09-rag/hybrid-search-and-reranking.
Code-side: /ship/08 — retrieval (BM25 + dense + rerank).