Visualize your token budget before you hit the limit
Every model has a fixed context window — the total number of tokens it can read and write in a single call. That budget is shared across your system prompt, any retrieved context, the conversation history, the user’s question, and the model’s own reply. This tool lays all of those out on a single stacked bar so you can see at a glance whether you fit, and which section is eating the most space.
How it works
Pick your model to set the total token limit, then paste each part of your prompt into its own box. The tool estimates token counts using a character-based heuristic (about four characters per token for English), adds the output tokens you want to reserve for the reply, and renders a color-coded bar showing the share of the window each part consumes. If the total exceeds the limit, the bar turns red and the overflow amount is shown.
Tips and example
In a retrieval app, the retrieved chunks are almost always the largest and most volatile component — they are the first thing to trim when you approach the limit. A common pattern is to reserve output first (say 2,000 tokens), subtract the fixed system prompt, and then fit as many ranked chunks as the remaining budget allows. Because token estimation is approximate, keep a 10-15% safety margin below the hard limit; hitting the ceiling exactly risks truncation or an API error. If you consistently overflow, summarize old history or move to a larger-context model rather than blindly dropping chunks.