What is a reasoning model?
A reasoning model is a large language model trained to think before it answers — it generates a long internal chain of reasoning, checks its own work, and only then produces a final response. Examples include OpenAI’s o1 and o3 and Gemini’s “thinking” modes. The key idea is test-time compute scaling: giving the model more computation at the moment you ask makes it markedly better at hard, multi-step problems.
How they differ from standard LLMs
A standard LLM like GPT-4o or Claude answers in roughly a single forward pass — fast and cheap, great for chat, drafting and summarising. A reasoning model instead spends a variable amount of effort “thinking” first:
- it produces many hidden reasoning tokens (an internal scratchpad),
- it can explore alternatives, catch its own mistakes, and backtrack,
- then it emits a concise final answer.
This is why reasoning models score far higher on competition maths, hard coding and logic benchmarks — and why they are slower and more expensive.
Test-time compute, briefly
For years, models got better mainly by scaling training. Reasoning models add a second axis: scaling inference. Letting a model “think longer” on a single hard question can beat a much larger model that answers instantly. You are, in effect, buying accuracy with compute at the time of the query.
When to use a reasoning model
| Use a reasoning model for | Use a standard model for |
|---|---|
| Hard maths and proofs | Everyday chat and Q&A |
| Complex, multi-file coding | Drafting and rewriting |
| Multi-step planning and analysis | Summarising and extraction |
| Tricky logic and debugging | High-volume, latency-sensitive tasks |
The cost trade-off
Because the hidden reasoning is usually billed as output tokens, a single reasoning call can cost several times more than a standard one and take longer to return. Reach for reasoning when correctness on a genuinely hard problem matters more than speed or price — and estimate the difference with the LLM API Cost Calculator.