Token Budget Negotiator for Claude

Set an optimal token_budget parameter for Claude extended thinking

Ad placeholder (leaderboard)

Claude’s extended thinking lets the model reason before it answers, controlled by a budget_tokens parameter. Too low and hard problems get under-reasoned; too high and you pay for thinking you do not need. This tool recommends a budget that fits your task and your cost tolerance.

How it works

The recommendation combines three inputs:

  1. Complexity baseline. Simple tasks start around 1,024 thinking tokens, medium around 4,000, hard around 10,000 — reflecting Anthropic’s guidance that harder reasoning warrants a larger budget.
  2. Quality-vs-cost preference. A slider scales that baseline up (toward quality) or down (toward cost) by up to roughly 1.6×.
  3. max_tokens safety. The recommended budget plus your expected answer length must fit inside max_tokens. The tool computes a suggested max_tokens and flags if the answer would be squeezed out.

It then estimates the thinking cost at your output price, since thinking tokens are billed as output tokens.

Worked example

A hard multi-step task, expecting a 1,500-token answer, slider at “balanced” (midpoint), output price $15/1M:

  • Baseline (hard): 10,000 thinking tokens
  • Balanced multiplier: ×1.15 → rounded to 11,520 recommended budget
  • Suggested max_tokens: 11,520 + 1,500 + 512 margin ≈ 13,500
  • Thinking cost per call: 11,520 × $15 / 1e6 ≈ $0.17

Slide all the way toward cost (×0.7) and the budget drops to about 6,912, cutting the thinking cost to roughly $0.10 — sensible if your evals show no quality loss.

Tips

  • Start from the recommendation, then tune on your own evals — raise the budget only while answer quality keeps improving.
  • Keep budget_tokens comfortably below max_tokens; the visible answer needs room.
  • Reserve large budgets for genuinely hard reasoning; simple extraction or formatting tasks barely benefit.
  • Model full spend with the LLM API Cost Calculator.
Ad placeholder (rectangle)