How is relevance scored?

Each sentence gets a score based on how many distinct query keywords it contains, normalised by the query length. Stop words are ignored so common words do not inflate scores.

Will this beat a real reranker?

No. Lexical overlap is a cheap heuristic and cannot match an embedding cross-encoder for semantic relevance. It is a fast, free first pass that removes obviously off-topic sentences before a smarter step.

Why prune context at all?

Retrieved chunks often carry boilerplate and tangential sentences that waste tokens and dilute the model's attention. Pruning lowers cost and can improve answer quality by raising signal density.

Does anything leave my browser?

No. All scoring and filtering run client-side in JavaScript. Your query and context never leave the page.

What is the RAG Context Pruner?

Score each sentence of your retrieved RAG context by keyword overlap with the query, filter out everything below a threshold you control, and preview the pruned context with the token savings — entirely in your browser. It runs free in your browser on Gera Tools, with nothing uploaded.

RAG Context Pruner — Gera Tools

Name: RAG Context Pruner
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

RAG context pruner

Retrieval-augmented generation often dumps whole chunks into the prompt, and a lot of those sentences have nothing to do with the question. This tool scores every sentence by keyword overlap with your query, drops the ones below a threshold you set, and shows how many tokens you saved — a quick way to trim a bloated context window before you pay for it.

Why retrieved context is often noisier than it looks

Vector search retrieves chunks by semantic proximity, not by the exact relevance of every sentence within those chunks. A chunk is usually a fixed-size block of text — 256 or 512 tokens — that was split mechanically from a larger document. The embedding captures the general topic of the chunk, which can be close to your query without every sentence being relevant. Common sources of filler:

Topic transition sentences — “In the next section, we will discuss…”
Boilerplate and disclaimers — “This document is for informational purposes…”
Tangentially related context — background sentences that share query vocabulary but don’t actually answer the question
Repeated definitions — the same concept defined the same way across multiple retrieved chunks

These sentences consume real tokens and can dilute the model’s attention toward the information that actually answers the question.

How the scoring works

The query is split into lowercase keywords with common stop words (the, a, is, of, in, etc.) removed. Each sentence in the retrieved context is scored by the fraction of unique query keywords it contains:

sentence_score = unique_query_keywords_in_sentence / total_unique_query_keywords

This gives a 0–1 relevance value where 1 means the sentence contains every keyword and 0 means it contains none. Sentences scoring at or above your chosen threshold are kept in their original order; the rest are dropped. The token count for both the original and pruned context is estimated at ~4 characters per token so you can see the savings immediately.

Worked example

Query: “refund policy for digital downloads”

A retrieved chunk might contain 12 sentences. After scoring, the distribution might look like:

High scorers (kept at threshold 0.25): sentences mentioning “refund,” “digital downloads,” “eligible,” “policy” — about 5 sentences
Low scorers (dropped): the opening paragraph about company history, a sentence about physical product returns, a boilerplate disclaimer

Result: the context drops from 350 tokens to roughly 140 tokens, a 60% reduction, while keeping the sentences that actually address the query. The model gets a denser, more focused signal.

How to choose the right threshold

The threshold controls the precision/recall tradeoff:

0.0 – 0.1 — keeps almost everything; only completely off-topic sentences are dropped. Use as a light pass to remove obvious filler.
0.1 – 0.25 — removes most low-value sentences without dropping sentences that share just one keyword with the query. Good starting point.
0.25 – 0.5 — aggressive pruning; use only when you are severely token-constrained and confident the query terms appear in all relevant sentences.
Above 0.5 — very aggressive; risks dropping relevant sentences that paraphrase the query rather than repeating its exact keywords.

Limits of lexical scoring

This is a keyword-overlap heuristic, not semantic search. It will:

Miss sentences that express the same idea with synonyms (“reimburse” vs “refund”)
Penalize correctly relevant sentences that avoid repeating the query terms
Over-reward sentences that happen to contain many query keywords but are not the best answer

Use it as a fast, free first pass to remove obvious filler — then apply an embedding reranker or manual review for the final selection.

Tips

Expand the query with synonyms to improve recall at a given threshold.
Watch for orphaned references. Dropping a sentence that defines a term used later in a kept sentence can leave dangling references — skim the output.
Lexical pruning is a pre-filter, not a replacement for a semantic reranker; it catches the easy wins at zero latency and cost.