Chain-of-Thought Prompting: Complete Guide with Examples

Make AI reason step-by-step with this proven technique

Ad placeholder (leaderboard)

Why thinking out loud makes models smarter

Language models that answer in a single pass tend to make careless mistakes on problems that require several steps — arithmetic, logic puzzles, multi-hop questions. Chain-of-thought (CoT) prompting fixes much of this by asking the model to show its working before committing to an answer. The reason it helps is partly mechanical: generating intermediate reasoning tokens gives the model more computation to reach the answer, and writing each step out makes it less likely to skip one. The famous trigger phrase “Let’s think step by step” was shown to dramatically raise accuracy on maths word problems with no other change. CoT is one of the highest-leverage prompting techniques because it is simple, broadly applicable, and often free.

Zero-shot CoT: the cheapest win

The simplest form adds a single instruction. Instead of “A shop has 23 apples, sells 15, buys 8 more — how many now?”, you write the same question and append “Let’s think step by step.” The model then narrates: 23 − 15 = 8, 8 + 8 = 16, so 16 apples. That narration is where the accuracy comes from. Zero-shot CoT costs nothing but a few words and works across ChatGPT, Claude, and Gemini. Use it as your default whenever a task involves calculation, ordering, or any multi-step logic. If you want a clean final answer for parsing, add: “…then give the final answer on its own line prefixed with ‘Answer:’.”

Few-shot CoT and self-consistency

When zero-shot reasoning is inconsistent, few-shot CoT teaches the model the style you want by showing two or three fully worked examples — each with the reasoning and the answer — before the real question. This anchors both the reasoning format and the answer format, and is more reliable for domain-specific tasks, at the cost of extra tokens. To squeeze out more accuracy, layer on self-consistency: run the CoT prompt several times with a non-zero temperature so you get different reasoning paths, then take the majority answer. Because correct reasoning tends to converge while errors scatter, the most common answer is usually right. It is more expensive (several calls per question) but valuable for high-stakes problems.

Tree-of-thought, limits, and when not to use CoT

For the hardest problems, tree-of-thought generalises CoT by exploring several reasoning branches, evaluating them, and backtracking from dead ends — useful for planning and search-like tasks, though it requires orchestration beyond a single prompt. Be aware of CoT’s limits, too. It does not magically make a model factually correct — flawed steps can still lead confidently to a wrong answer, so verify the final result. And on dedicated reasoning models like OpenAI’s o-series, which already reason internally, explicit “think step by step” instructions add little and can interfere; just state the problem clearly. The practical rule: reach for zero-shot CoT by default on multi-step tasks, escalate to few-shot or self-consistency when accuracy matters, and skip it on models that already think for themselves.

Ad placeholder (rectangle)