Completion length predictor
Output tokens usually cost more than input tokens, and they are the part you cannot see until after you pay for them. Always budgeting at max_tokens overstates your bill; ignoring output entirely understates it. This predictor reads your prompt and task type to estimate a realistic output length and cost, so your projections sit close to reality.
How it works
Each task type carries a characteristic output profile: summaries shrink the input, Q&A answers are short, code and long-form writing expand well beyond the prompt. The tool starts from the task profile, then adjusts using signals in your prompt — input length, and any explicit length instruction like “in one paragraph” or “list ten items.” It returns a low, expected, and high token estimate and prices the expected completion against your model’s output rate. Everything runs in your browser.
Tips and notes
The single biggest accuracy gain is telling the model how long to be — “answer in two sentences” both shortens output and makes it predictable. Use the high estimate to set a sensible max_tokens cap and the expected estimate for cost budgeting; they serve different purposes. Output cost often dominates total cost on generative tasks, so this estimate matters more than input counting for chat and code workloads. Measure your real completions over a few hundred calls and adjust your assumptions — nothing beats your own data for refining the forecast.