Temperature in one sentence
Temperature is the dial that controls how random a language model’s output is. At every step, the model produces a probability for each possible next token; temperature decides how strictly it sticks to the most likely choices. Turn it down and the model becomes focused and predictable, almost always picking its top candidate. Turn it up and the model takes more chances, giving unlikely tokens a real shot — which makes the text more varied, more creative, and eventually more chaotic. The slider below lets you map common tasks to a sensible temperature and see what happens at the extremes.
How it works under the hood
Before choosing a token, the model converts its raw scores (logits) into probabilities using a function called softmax. Temperature is a number that divides those logits before the softmax runs. Dividing by a small number (low temperature) sharpens the distribution — the gap between the top choice and the rest grows, so the model almost always picks the favourite. Dividing by a larger number (high temperature) flattens the distribution — the probabilities move closer together, so weaker candidates get picked more often. At temperature 0 the model effectively always takes the single highest-probability token (deterministic); at 2.0 the distribution is so flat that output can become incoherent.
Choosing a value for the task
There is no single correct temperature — it depends on what you want. For accuracy and repeatability — code generation, data extraction, factual answers, structured output — stay low (0.0 to about 0.3) so the model commits to its best, most reliable choice. For everyday assistant work, a default around 0.7 balances coherence with a bit of variety. For creative tasks — brainstorming, fiction, marketing copy, generating diverse options — go higher (0.8 to ~1.2) to get fresh, less repetitive results. Above roughly 1.3 the output grows unpredictable and often unusable for serious work.
Practical tips
Adjust temperature before touching other sampling knobs, and change one knob at a time so you can tell what caused a difference. Remember that high temperature trades reliability for variety: it is a feature for creativity but a bug for tasks that demand correctness, where a high setting raises the chance of plausible-sounding mistakes. If you need both diverse and sensible output, a moderate temperature (around 0.7–0.9) usually beats a very high one. And if you are debugging odd behaviour, set temperature near 0 first to get a stable baseline, then reintroduce randomness deliberately once the prompt itself is solid.