Tune the BM25 ↔ dense blend for hybrid RAG
Hybrid retrieval combines keyword search (BM25) and dense vector search, blended by a weight alpha. Picking alpha by gut feel leaves quality on the table. This tool takes your labelled eval scores and sweeps every alpha from 0 to 1, reporting the value that maximises your chosen retrieval metric — so you ship the blend your own data prefers. It runs in your browser.
How it works
For each query, the candidates’ BM25 and dense scores are min-max normalised to 0–1 (so neither scale dominates), then combined:
score = alpha × dense_norm + (1 − alpha) × bm25_norm
Candidates are re-ranked by the blended score and the chosen metric is computed:
- MRR — mean of
1 / rankof the first relevant result per query. - Hit@K — fraction of queries with a relevant result in the top K.
The tuner evaluates alpha at fine steps across [0, 1] and returns the alpha
with the best average metric, plus a small table showing how the metric varies
so you can see how sensitive your system is to the choice.
Tips and notes
- A flat curve means alpha barely matters — pick a value for stability and move on. A sharp peak means the blend really matters; lock it in.
- Make sure each query’s candidate list includes the relevant doc(s); otherwise no alpha can rank them and the metric is artificially low.
- This optimises the linear blend; if you use reciprocal rank fusion instead,
the intuition (balance keyword vs semantic) still holds, but tune RRF’s
kseparately.