Context Window Size vs Retrieval Quality Tradeoff

Model the cost-quality tradeoff as you retrieve more context chunks.

Ad placeholder (leaderboard)

More retrieved context is not always better

In a RAG system it is tempting to retrieve “just a few more chunks” to be safe. But each chunk adds input tokens (linear cost) while quality follows diminishing returns — and past a point the lost-in-the-middle effect means extra context can actually lower answer quality. This tool plots the cost against an illustrative quality curve so you can find the point of best value rather than defaulting to the maximum.

How it works

For each chunk count, the cost is straightforward: (chunks × tokens_per_chunk × input_price) + (output_tokens × output_price) — it rises in a straight line. The quality side uses the curve shape you choose. Logarithmic captures classic diminishing returns; plateau adds the lost-in-the-middle behaviour where quality peaks around a middling chunk count and then declines; linear is the naive “more always helps” assumption for comparison. The tool then reports the best quality-per-dollar point — usually well below your maximum — and where quality actually peaks.

Tips for efficient retrieval

  • Rerank, don’t pile on. A good reranker that surfaces the top 3–5 relevant chunks beats dumping 20 mediocre ones for both cost and quality.
  • Mind the position. Put the most important context at the start or end of the prompt to avoid the middle blind spot.
  • Deduplicate. Near-identical chunks waste tokens and crowd out genuinely new information.
  • Measure on your eval set. The curves here illustrate the shape of the tradeoff — find your real optimum by testing chunk counts against graded answers.
Ad placeholder (rectangle)