What is the thinking token budget in Claude?

When extended thinking is enabled, Claude reasons internally before answering. The budget_tokens parameter caps how many tokens that internal reasoning can use. A larger budget lets the model think more on hard problems, at higher output-token cost.

How big should the budget be?

Anthropic suggests starting around 1,024 tokens minimum and increasing for harder tasks. Simple tasks rarely need more than a couple thousand; complex multi-step reasoning can justify 8,000 or more. This tool scales the recommendation by complexity and your preference.

Does the thinking budget count as output tokens?

Yes. Thinking tokens are billed as output tokens and count toward your max_tokens limit, so budget_tokens must be smaller than max_tokens with room left for the visible answer. The tool warns if the answer would not fit.

Will a bigger budget always give a better answer?

No. There are diminishing returns. Past the point where the model has fully reasoned through the problem, extra budget is wasted spend. Increase it only until quality stops improving on your evals.

Can I set the budget too low?

Yes. If the budget is too small for the task, Claude may truncate its reasoning and produce a weaker answer. Match the budget to genuine task difficulty rather than defaulting to the minimum on hard problems.

What is the Token Budget Negotiator for Claude?

Free helper for Claude extended thinking. Recommends a thinking token_budget from task complexity, expected answer length and your quality-vs-cost preference, with the estimated cost impact. Runs entirely in your browser. It runs free in your browser on Gera Tools, with nothing uploaded.

Token Budget Negotiator for Claude

Name: Token Budget Negotiator for Claude
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Claude’s extended thinking lets the model reason before it answers, controlled by a budget_tokens parameter. Too low and hard problems get under-reasoned; too high and you pay for thinking you do not need. This tool recommends a budget that fits your task and your cost tolerance.

What the thinking budget controls

When extended thinking is enabled, Claude works through a problem internally — exploring multiple angles, checking its own reasoning, and revising its approach before producing the final answer. This internal reasoning is invisible in the response but is billed as output tokens, because the model generates those tokens even if you never see them.

The budget_tokens parameter sets a cap on how many of these thinking tokens Claude can use. A budget that is too small forces Claude to truncate its reasoning mid-thought, which often produces weaker answers on hard problems. A budget that is too large wastes money on reasoning that the model has already completed.

How it works

The recommendation combines three inputs:

Complexity baseline. Simple tasks start around 1,024 thinking tokens, medium around 4,000, hard around 10,000 — reflecting Anthropic’s guidance that harder reasoning warrants a larger budget.
Quality-vs-cost preference. A slider scales that baseline up (toward quality) or down (toward cost) by up to roughly 1.6×.
max_tokens safety. The recommended budget plus your expected answer length must fit inside max_tokens. The tool computes a suggested max_tokens and flags if the answer would be squeezed out.

It then estimates the thinking cost at your output price, since thinking tokens are billed as output tokens.

Worked example

A hard multi-step task, expecting a 1,500-token answer, slider at “balanced” (midpoint), output price $15/1M:

Baseline (hard): 10,000 thinking tokens
Balanced multiplier: ×1.15 → rounded to 11,520 recommended budget
Suggested max_tokens: 11,520 + 1,500 + 512 margin ≈ 13,500
Thinking cost per call: 11,520 × $15 / 1e6 ≈ $0.17

Slide all the way toward cost (×0.7) and the budget drops to about 6,912, cutting the thinking cost to roughly $0.10 — sensible if your evals show no quality loss.

When extended thinking helps — and when it does not

Extended thinking improves performance most on tasks that involve multi-step planning, mathematical reasoning, logic puzzles, code debugging with many interacting variables, and decisions that require weighing trade-offs. For these, a generous budget pays off.

It adds little value for tasks where the answer is direct: simple questions, short extractions, text reformatting, translation, or factual lookups. Running extended thinking at a large budget on these tasks is pure cost overhead.

A practical approach: enable thinking at a medium budget on your hardest 20% of prompts, and disable it on the straightforward 80%. The tool’s worked example lets you see the cost difference before committing to a configuration.

Tips

Start from the recommendation, then tune on your own evals — raise the budget only while answer quality keeps improving.
Keep budget_tokens comfortably below max_tokens; the visible answer needs room.
The Anthropic-documented minimum is 1,024 tokens; values below this are not guaranteed to work.
Model full spend across your entire workload with the LLM API Cost Calculator once you have settled on a budget.