The idea in one sentence
Chain-of-thought (CoT) reasoning is the simple but powerful technique of getting a language model to write out its intermediate steps before committing to an answer. Instead of jumping straight to a result, the model “thinks aloud” — and that act of laying out the working dramatically improves accuracy on problems that require several steps, like arithmetic, logic puzzles, and word problems.
Why thinking step by step works
A language model generates text one token at a time, and each token it writes becomes part of the context for the next. When a hard problem is answered in a single leap, the model has no room to do the intermediate computation — it has to compress all the reasoning into one opaque jump and frequently gets it wrong. Chain-of-thought gives the model that room: by producing the steps explicitly, it can break the problem into smaller sub-problems, carry intermediate results forward, and check itself along the way. The reasoning tokens act like scratch paper.
Zero-shot, few-shot, and tree-of-thought
The original CoT work showed that including a few worked examples with their reasoning — few-shot CoT — let models solve grade-school math problems they previously failed. A follow-up found that you often do not even need examples: just appending “Let’s think step by step” triggers the same behaviour, known as zero-shot CoT. More advanced variants like tree-of-thought explore several reasoning branches at once and backtrack, trading extra compute for better performance on search and planning tasks. Use the demo above to compare a direct answer against zero-shot and few-shot CoT on the same problem.
When to use chain-of-thought
CoT shines on anything multi-step: math, logic, planning, debugging, and questions that need careful intermediate inference. It tends to help most with larger, more capable models — the ability to benefit from CoT is itself often cited as an emergent capability of scale. The trade-offs are that it makes responses longer, slower, and more expensive, and the visible “reasoning” is not always a faithful account of how the model actually reached its answer. For simple factual lookups it adds little. Used well, though, it is one of the highest-leverage prompting techniques available.