What does temperature do in an LLM?

Temperature controls how random the model's next-token choice is. At a low temperature the model strongly favours its most likely token, producing focused, predictable text. At a high temperature the probabilities flatten, so less likely tokens get a real chance, producing more varied and surprising output.

What is a good default temperature?

Around 0.7 is a sensible general-purpose default that balances coherence with some variety. Many factual or coding tasks do better near 0.0–0.3, while creative writing and brainstorming benefit from 0.8–1.2. There is no universally correct value — it depends on the task.

Does temperature 0 always give the same answer?

Temperature 0 makes the model almost always pick its single highest-probability token, so output is far more deterministic and repeatable. In practice it is not perfectly identical every time due to implementation details and tie-breaking, but it is as close to deterministic as the parameter gets.

How does temperature relate to top-p?

Both shape the randomness of sampling but in different ways: temperature rescales the whole probability distribution, while top-p truncates it to the smallest set of tokens whose probabilities sum to a threshold. They are usually tuned independently, and adjusting both at once is generally discouraged.

What Is Temperature in AI? Controlling Randomness in LLM Output

Temperature in one sentence

Temperature is the dial that controls how random a language model’s output is. At every step, the model produces a probability for each possible next token; temperature decides how strictly it sticks to the most likely choices. Turn it down and the model becomes focused and predictable, almost always picking its top candidate. Turn it up and the model takes more chances, giving unlikely tokens a real shot — which makes the text more varied, more creative, and eventually more chaotic. The slider below lets you map common tasks to a sensible temperature and see what happens at the extremes.

How it works under the hood

Before choosing a token, the model converts its raw scores (logits) into probabilities using a function called softmax. Temperature is a number that divides those logits before the softmax runs. Dividing by a small number (low temperature) sharpens the distribution — the gap between the top choice and the rest grows, so the model almost always picks the favourite. Dividing by a larger number (high temperature) flattens the distribution — the probabilities move closer together, so weaker candidates get picked more often. At temperature 0 the model effectively always takes the single highest-probability token (deterministic); at 2.0 the distribution is so flat that output can become incoherent.

Choosing a value for the task

There is no single correct temperature — it depends on what you want. For accuracy and repeatability — code generation, data extraction, factual answers, structured output — stay low (0.0 to about 0.3) so the model commits to its best, most reliable choice. For everyday assistant work, a default around 0.7 balances coherence with a bit of variety. For creative tasks — brainstorming, fiction, marketing copy, generating diverse options — go higher (0.8 to ~1.2) to get fresh, less repetitive results. Above roughly 1.3 the output grows unpredictable and often unusable for serious work.

Practical tips

Adjust temperature before touching other sampling knobs, and change one knob at a time so you can tell what caused a difference. Remember that high temperature trades reliability for variety: it is a feature for creativity but a bug for tasks that demand correctness, where a high setting raises the chance of plausible-sounding mistakes. If you need both diverse and sensible output, a moderate temperature (around 0.7–0.9) usually beats a very high one. And if you are debugging odd behaviour, set temperature near 0 first to get a stable baseline, then reintroduce randomness deliberately once the prompt itself is solid.