More retrieved context is not always better
In a RAG system it is tempting to retrieve “just a few more chunks” to be safe. But each chunk adds input tokens (linear cost) while quality follows diminishing returns — and past a point the lost-in-the-middle effect means extra context can actually lower answer quality. This tool plots the cost against an illustrative quality curve so you can find the point of best value rather than defaulting to the maximum.
How it works
For each chunk count, the cost is straightforward: (chunks × tokens_per_chunk × input_price) + (output_tokens × output_price) — it rises in a straight line. The
quality side uses the curve shape you choose. Logarithmic captures classic
diminishing returns; plateau adds the lost-in-the-middle behaviour where
quality peaks around a middling chunk count and then declines; linear is the
naive “more always helps” assumption for comparison. The tool then reports the
best quality-per-dollar point — usually well below your maximum — and where
quality actually peaks.
Tips for efficient retrieval
- Rerank, don’t pile on. A good reranker that surfaces the top 3–5 relevant chunks beats dumping 20 mediocre ones for both cost and quality.
- Mind the position. Put the most important context at the start or end of the prompt to avoid the middle blind spot.
- Deduplicate. Near-identical chunks waste tokens and crowd out genuinely new information.
- Measure on your eval set. The curves here illustrate the shape of the tradeoff — find your real optimum by testing chunk counts against graded answers.