Token budget splitter
A model’s context window is a fixed budget you have to divide between four competing demands: system instructions, retrieved context, the user’s message, and room for the reply. This tool takes your context limit and fixed slots, then tells you exactly how many tokens are left for retrieved context chunks — the slot that usually flexes in RAG systems.
How it works
You enter the total context window and three known quantities: system prompt tokens, reserved completion tokens, and your typical user message size. The splitter subtracts those from the window and reports the remaining budget available for retrieved context, plus a percentage breakdown of how the whole window is allocated. If your fixed slots already exceed the window, it flags the overflow and shows by how much you are over.
Tips and notes
This is the core sizing step for any RAG pipeline: it tells you how many chunks (or how large a chunk) you can retrieve before you blow the budget. Reserve completion tokens generously — reasoning models emit hidden thinking tokens that count against the window. Leave 10-20% slack for token-estimate error and special tokens rather than packing to 100%. To validate a specific assembled prompt against a model, pair this with the context window planner.