Prompt Token Budget Planner

Allocate tokens across system, context, and output within your context window.

Ad placeholder (leaderboard)

Plan your token budget before you call

Every LLM call has to fit inside a fixed context window. This planner helps you divide that window sensibly: reserve tokens for the output, account for your system prompt, and see how much room is left for user context and retrieved documents — with a visual bar that turns red the moment you overflow.

How the budget works

For most models the context window is shared by input and output. The planner subtracts in this order:

remaining_for_context = window − output_reserve − system_prompt

If that number goes negative, the request won’t fit. Because the model’s reply lives in the same window, reserving too little output truncates the answer; reserving too much starves your context. The bar shows the four segments — system, context, output, and free — proportionally.

Tips

  • Reserve output first. Decide how long the answer needs to be, then spend what’s left on input.
  • Keep the system prompt lean. It’s billed and counted on every call; move large stable references into retrieved context instead.
  • Leave a 5–10% margin. Token estimates and chat-template overhead mean the real count is a bit higher than your math.
Ad placeholder (rectangle)