Claude’s extended thinking lets the model reason before it answers, controlled by
a budget_tokens parameter. Too low and hard problems get under-reasoned; too
high and you pay for thinking you do not need. This tool recommends a budget that
fits your task and your cost tolerance.
How it works
The recommendation combines three inputs:
- Complexity baseline. Simple tasks start around 1,024 thinking tokens, medium around 4,000, hard around 10,000 — reflecting Anthropic’s guidance that harder reasoning warrants a larger budget.
- Quality-vs-cost preference. A slider scales that baseline up (toward quality) or down (toward cost) by up to roughly 1.6×.
- max_tokens safety. The recommended budget plus your expected answer length
must fit inside
max_tokens. The tool computes a suggestedmax_tokensand flags if the answer would be squeezed out.
It then estimates the thinking cost at your output price, since thinking tokens are billed as output tokens.
Worked example
A hard multi-step task, expecting a 1,500-token answer, slider at “balanced” (midpoint), output price $15/1M:
- Baseline (hard): 10,000 thinking tokens
- Balanced multiplier: ×1.15 → rounded to 11,520 recommended budget
- Suggested max_tokens: 11,520 + 1,500 + 512 margin ≈ 13,500
- Thinking cost per call: 11,520 × $15 / 1e6 ≈ $0.17
Slide all the way toward cost (×0.7) and the budget drops to about 6,912, cutting the thinking cost to roughly $0.10 — sensible if your evals show no quality loss.
Tips
- Start from the recommendation, then tune on your own evals — raise the budget only while answer quality keeps improving.
- Keep
budget_tokenscomfortably belowmax_tokens; the visible answer needs room. - Reserve large budgets for genuinely hard reasoning; simple extraction or formatting tasks barely benefit.
- Model full spend with the LLM API Cost Calculator.