What does a reranker actually do?

A reranker is usually a cross-encoder model that scores each query-document pair jointly, rather than comparing pre-computed embeddings. It is slower but far more accurate, so you retrieve a large candidate set with vectors and then rerank the top 20-100 to surface the truly relevant results.

Why are reranker scores on a different scale than my vector scores?

Vector cosine similarity is bounded roughly between -1 and 1, while reranker relevance scores depend on the model and are often logits or sigmoid probabilities. This tool ranks by each column independently, so absolute scale does not matter for the rank comparison.

What is NDCG and why use it here?

Normalized Discounted Cumulative Gain measures how well the top results match your known-relevant documents, weighting higher positions more heavily. Comparing NDCG before and after reranking gives you a single number for how much the reranker improved ordering.

How many candidates should I rerank?

Retrieve 50-100 candidates with fast vector search, then rerank the top 20-50. Reranking everything is expensive and unnecessary; reranking too few risks the best document never reaching the cross-encoder.

What is the Reranker Score Comparator?

Paste retrieval results with their initial scores and reranker scores to visualize rank changes, score deltas, and NDCG@k improvement. See exactly which documents the reranker promoted or demoted. It runs free in your browser on Gera Tools, with nothing uploaded.

Reranker Score Comparator

Name: Reranker Score Comparator
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Compare your retrieval pipeline before and after reranking

In a modern RAG pipeline you usually retrieve a broad set of candidates with fast vector similarity, then pass the top results through a reranker — a cross-encoder that scores each query-document pair jointly for much higher precision. This tool lets you paste both score sets side by side so you can see exactly which documents the reranker promoted, which it demoted, and whether the final ordering is genuinely better.

How it works

Enter one candidate per line as label, initial_score, reranker_score. The tool sorts the list twice — once by the initial (vector) score and once by the reranker score — and lines up the two rankings. For each document it computes the rank delta (how many positions it moved) and the score delta. Documents the reranker pushed up are highlighted as promotions; documents it pushed down are demotions.

If you also mark which documents are actually relevant, the tool computes NDCG@k for both orderings. NDCG rewards putting relevant documents near the top, with a logarithmic discount for lower positions, so the before/after numbers give you a concrete measure of reranking quality rather than a gut feeling.

Understanding the score columns

Vector similarity scores (cosine, dot product) and cross-encoder reranker scores live on different scales and have different meanings. Cosine similarity is bounded between -1 and 1, where 1 is identical vectors and 0 is orthogonal. Cross-encoder scores are often logits or sigmoid probabilities and typically range from 0 to 1 (for a relevance probability) or span a wider range depending on the model. This tool treats each column independently for ranking purposes — it ranks candidates within each column separately, so the absolute scale difference does not matter.

What matters is the rank, not the raw score. A document ranked 1st by vector similarity but 6th by the reranker has been demoted by 5 positions — that is signal that the embedding similarity did not correlate with actual relevance for this query.

Worked example

Suppose you retrieve 10 documents for the query “refund policy for digital downloads”. Vector similarity ranks the FAQ overview page 1st because it contains the word “refund” many times. The cross-encoder reranker, which reads the query and document together, demotes it to 4th and promotes a specific page about digital purchase returns to 1st — because that page directly answers the question even though it scores lower on raw term overlap. The comparator makes this swap visible: one row shows a -3 rank delta (demoted), another shows +3 (promoted). If you mark the digital-returns page as relevant, NDCG@3 rises because the relevant document moved into position 1.

Tips and notes

The two score columns can be on completely different scales — that is fine, because ranking is scale-invariant within each column. Focus on the rank movements, not the raw numbers. A healthy reranker typically reshuffles the top results noticeably; if nothing moves, your candidates may already be well-ordered or the reranker may not be adding value for this query. When NDCG barely changes, consider whether your relevance labels are correct or whether the reranker is suited to your domain.

How many candidates to rerank

The retrieve-then-rerank pattern works because the two steps optimise for different things. Vector search is fast and approximate — it retrieves a large candidate set quickly using precomputed embeddings. The cross-encoder is slow (it runs a forward pass for every query-document pair) but precise. A typical production setup retrieves 50–100 candidates with vector search, then reranks the top 20–50 of those. Reranking all 100 is expensive and rarely improves quality meaningfully; reranking only the top 5 risks missing a good document that landed at position 6. The comparator helps you tune this window: if the top-k NDCG stops improving beyond a certain cutoff, you have found your optimal reranking depth.