Question 1

What is the difference between temperature and top_p?

Accepted Answer

Temperature scales the randomness of the whole probability distribution, while top_p (nucleus sampling) restricts sampling to the smallest set of tokens whose probabilities sum to p. OpenAI recommends changing one or the other, not both at once, because their effects interact in ways that are hard to reason about. A common pattern is temperature 0 for deterministic tasks and around 0.7 to 1.0 for creative ones.

Question 2

Which OpenAI endpoint should I use for a chatbot?

Accepted Answer

Use the Chat Completions endpoint at https://api.openai.com/v1/chat/completions, which takes a messages array with role and content fields. The older Completions endpoint is legacy and not recommended for new work. For multi-step agents and tool orchestration, the Responses API is the newer option, but Chat Completions remains the widely supported default.

Question 3

What do max_tokens and the context window actually limit?

Accepted Answer

max_tokens caps only the length of the generated completion, not the input. The model's total context window must hold the prompt tokens plus the completion tokens together. If your prompt is large, you must leave room for the response, or the request will be rejected for exceeding the context limit.

Question 4

What does HTTP 429 mean from the OpenAI API?

Accepted Answer

A 429 means you have hit a rate limit or run out of quota. It can indicate too many requests per minute, too many tokens per minute, or an exhausted billing balance. The fix is exponential backoff with retries for transient bursts, and checking your usage dashboard and billing if it persists.

OpenAI API Cheatsheet: Parameters, Models, and Endpoints

What this cheatsheet covers

Core endpoints

Key generation parameters

Minimal request examples

Common error codes