Compare your retrieval pipeline before and after reranking
In a modern RAG pipeline you usually retrieve a broad set of candidates with fast vector similarity, then pass the top results through a reranker — a cross-encoder that scores each query-document pair jointly for much higher precision. This tool lets you paste both score sets side by side so you can see exactly which documents the reranker promoted, which it demoted, and whether the final ordering is genuinely better.
How it works
Enter one candidate per line as label, initial_score, reranker_score. The tool
sorts the list twice — once by the initial (vector) score and once by the
reranker score — and lines up the two rankings. For each document it computes the
rank delta (how many positions it moved) and the score delta. Documents the
reranker pushed up are highlighted as promotions; documents it pushed down are
demotions.
If you also mark which documents are actually relevant, the tool computes NDCG@k for both orderings. NDCG rewards putting relevant documents near the top, with a logarithmic discount for lower positions, so the before/after numbers give you a concrete measure of reranking quality rather than a gut feeling.
Tips and notes
The two score columns can be on completely different scales — that is fine, because ranking is scale-invariant within each column. Focus on the rank movements, not the raw numbers. A healthy reranker typically reshuffles the top results noticeably; if nothing moves, your candidates may already be well-ordered or the reranker may not be adding value for this query. When NDCG barely changes, consider whether your relevance labels are correct or whether the reranker is suited to your domain.