Pick sampling settings with confidence
Temperature and top-p control how random an LLM’s output is, but the numbers are abstract until you see what they do. Drag the sliders to read a plain-English explanation of each setting in real time, or load a task preset and copy the recommended values straight into your API call.
How sampling works
After processing your prompt, the model produces a probability for every possible next token. Sampling parameters decide how that distribution is turned into an actual choice:
- Temperature rescales the probabilities. At
0the model is nearly deterministic — it picks the most likely token almost every time. As temperature rises toward1and beyond, the distribution flattens, so unlikely tokens get chosen more often. High temperature means more variety and more risk of going off-topic. - Top-p (nucleus sampling) caps which tokens are even eligible. It keeps
the smallest group of top tokens whose probabilities sum to
p, then samples among those. Atop_pof0.9trims the improbable tail; lowering it tightens the output.
The two interact, so the usual advice is to tune one and leave the other at its default.
Task-based starting points
- Factual QA / extraction / classification — low temperature (0–0.3) for consistent, repeatable answers.
- Code generation — low-to-moderate (0.2) so it stays correct but not rigid.
- Summarization — moderate (0.3–0.5) to stay faithful with light rephrasing.
- Creative writing / brainstorming — high (0.8–1.0) for variety and surprise.
When in doubt, start at the preset, generate a few samples, and nudge upward only if the output feels too repetitive.