RAG context budget calculator
When you build a retrieval-augmented generation (RAG) pipeline, the single hardest constraint is the context window. Every retrieved document chunk, your system prompt, the user’s question, and the space the model needs to answer all compete for the same fixed token budget. This calculator tells you how many chunks you can actually retrieve before you run out of room.
How it works
The context window is a fixed number of tokens for your chosen model. From that total the calculator subtracts three fixed costs: the system prompt, the user query, and a completion reserve for the model’s answer. Whatever is left is your retrieval budget.
Each chunk costs its base size plus its overlap, because overlap tokens are duplicated into the chunk and still occupy space when retrieved. Dividing the remaining budget by that effective chunk size gives the maximum number of chunks you can safely pass in. The calculator floors the result so you never plan for a partial chunk that would overflow the window.
Tips and notes
- Reserve generously for the completion. A summarisation task might only need a few hundred output tokens, but an agent that writes code can need several thousand. Under-reserving causes truncated answers.
- Smaller chunks improve precision but add overhead. Many small chunks let retrieval target the exact passage, but more chunks mean more glue tokens and more overlap waste. Use the chunking-strategy calculator to balance this.
- Token estimates are approximate. Tokenizers split text differently across models, so leave a safety margin rather than packing the window to the byte.