RAG Context Window Optimizer

Maximize relevant context in your prompt while staying within token limits.

Ad placeholder (leaderboard)

RAG context window optimizer

Retrieval-augmented generation lives or dies on what you put in the context window. Stuff in too much and you blow the token budget or bury the signal; trim too aggressively and the model misses the answer. This tool takes your retrieved chunks — each with a token count and a relevance score — and packs the most valuable ones into a fixed token budget so every token earns its place.

How it works

Each chunk’s value density is its relevance score divided by its token count: how much relevance you buy per token. The optimizer sorts chunks by density, then greedily adds them in order, skipping any that would overflow the remaining budget. After the greedy pass it attempts one improvement sweep — checking whether any skipped chunk can replace a lower-value included chunk for a net gain — which catches the common case where a large early pick blocked two smaller, higher-total picks. The result lists the selected chunks, the dropped ones, total tokens used, and the relevance captured.

Tips and notes

  • Reserve headroom. Set the budget to your model’s window minus the system prompt, the user question, and the expected answer length — not the full context window.
  • Rerank first. The optimizer trusts your scores. Feeding it cross-encoder reranker scores rather than raw vector similarity usually improves selection.
  • Watch the dropped list. If a high-score chunk keeps getting dropped, it is likely too long — consider splitting it into smaller passages upstream.
  • Everything is local. No data leaves your browser.
Ad placeholder (rectangle)