Temperature (AI Glossary)

The dial that controls how random vs deterministic an LLM's output is

Ad placeholder (leaderboard)

Definition

Temperature is a sampling parameter that controls how random or how focused a language model’s output is. After the model computes a probability for every possible next token, temperature rescales those probabilities before one is chosen. A low temperature makes the model strongly favour its top-ranked tokens (predictable, repetitive); a high temperature evens out the odds so that less likely tokens get selected more often (varied, surprising).

How it works under the hood

LLMs turn raw scores (logits) into probabilities using the softmax function. Temperature divides each logit before that step: dividing by a small number exaggerates the gaps between tokens, so the highest-scoring token wins almost every time; dividing by a larger number compresses the gaps, giving long-shot tokens a real chance. At temperature 0 the model effectively always picks its single most likely token (greedy decoding); as temperature rises, the distribution flattens toward uniform randomness.

The practical range

Most APIs expose temperature on a scale from 0.0 to 2.0:

  • 0.0 – 0.3 — focused and near-deterministic. Best for facts, extraction, classification, math, and code.
  • 0.4 – 0.7 — balanced. Good default for general assistance and explanations.
  • 0.8 – 1.0 — creative and varied. Useful for brainstorming and casual writing.
  • 1.2 – 2.0 — high randomness. Output gets unpredictable and can lose coherence, useful mainly for ideation or deliberate divergence.

Choosing the right value

Match temperature to the cost of being wrong. For anything where a single correct answer exists — a SQL query, a JSON payload, a factual lookup — keep it low so the model stays consistent. For tasks where variety is the point — slogans, story openings, alternative phrasings — raise it. A common mistake is leaving a creative default like 0.8 on a task that needs determinism, producing flaky results that are hard to debug.

Temperature vs top-p

Temperature is one of two main randomness controls; the other is top-p (nucleus sampling), which restricts choices to the smallest set of tokens whose probabilities sum to p. They interact, and the common advice is to tune one, not both — pick temperature or top-p as your primary dial and leave the other at its default to avoid compounding effects that make behaviour hard to reason about.

Ad placeholder (rectangle)