Temperature & Top-p Explainer

Understand and pick the right temperature and top-p for your LLM task.

Ad placeholder (leaderboard)

Pick sampling settings with confidence

Temperature and top-p control how random an LLM’s output is, but the numbers are abstract until you see what they do. Drag the sliders to read a plain-English explanation of each setting in real time, or load a task preset and copy the recommended values straight into your API call.

How sampling works

After processing your prompt, the model produces a probability for every possible next token. Sampling parameters decide how that distribution is turned into an actual choice:

  • Temperature rescales the probabilities. At 0 the model is nearly deterministic — it picks the most likely token almost every time. As temperature rises toward 1 and beyond, the distribution flattens, so unlikely tokens get chosen more often. High temperature means more variety and more risk of going off-topic.
  • Top-p (nucleus sampling) caps which tokens are even eligible. It keeps the smallest group of top tokens whose probabilities sum to p, then samples among those. A top_p of 0.9 trims the improbable tail; lowering it tightens the output.

The two interact, so the usual advice is to tune one and leave the other at its default.

Task-based starting points

  • Factual QA / extraction / classification — low temperature (0–0.3) for consistent, repeatable answers.
  • Code generation — low-to-moderate (0.2) so it stays correct but not rigid.
  • Summarization — moderate (0.3–0.5) to stay faithful with light rephrasing.
  • Creative writing / brainstorming — high (0.8–1.0) for variety and surprise.

When in doubt, start at the preset, generate a few samples, and nudge upward only if the output feels too repetitive.

Ad placeholder (rectangle)