The context window budget planner helps you decide how to spend a finite resource: the tokens a model can hold in a single request. Every token you give to the system prompt, conversation history, or retrieved documents is a token you cannot give to the answer. This tool lets you set a model’s window size and divide it across segments so you can see, before you build, whether your design fits.
How it works
You choose a model preset (or type a custom window size), then enter token estimates for each segment: the system prompt, conversation history, documents or retrieved context, the user query, and a reserved output budget. The planner sums them, shows each segment as a percentage of the window, and reports the remaining headroom. If the total exceeds the window, it flags the overflow and tells you exactly how many tokens to cut.
All of this is arithmetic in your browser — nothing is sent anywhere. Token counts are estimates; for exact figures, run your text through the tokenizer for your specific model, because GPT, Claude, and other families tokenize differently.
Tips and examples
A common mistake is forgetting that output shares the window. If a model has a 128k window and you stuff 127k tokens of documents in, there is no room to answer. Reserve a realistic output budget first, then fit input around it. For long chats, cap history with a sliding window or periodic summary so it does not grow unbounded. For retrieval-augmented prompts, the document segment is usually where the budget goes — prefer fewer, more relevant chunks over many marginal ones. Use the planner to compare a 16k versus a 128k model: the larger window does not just cost more, it changes what is feasible.