What this cheatsheet covers
The OpenAI API is a small set of HTTP endpoints with a handful of tuning parameters, but getting predictable results depends on knowing exactly what each parameter does. This reference covers the endpoints you will actually call, the sampling parameters that shape output, the current model families, and the error codes you will hit in production — with the intent that you can keep one tab open and stop guessing.
Core endpoints
All requests go to https://api.openai.com/v1/... and authenticate with a bearer
token in the Authorization: Bearer $OPENAI_API_KEY header.
- Chat Completions —
POST /v1/chat/completions. The default for chat and most text generation. Takes amessagesarray of{ role, content }objects, whereroleissystem,user,assistant, ortool. - Responses —
POST /v1/responses. The newer, higher-level API for agentic workflows, built-in tools, and stateful multi-step calls. - Embeddings —
POST /v1/embeddings. Returns a vector for semantic search, clustering, and retrieval. - Images —
POST /v1/images/generations. Text-to-image generation. - Audio —
POST /v1/audio/transcriptions(speech to text) andPOST /v1/audio/speech(text to speech). - Models —
GET /v1/models. Lists the model IDs available to your account.
Key generation parameters
These shape the output of a chat or text request:
model— the model ID string, e.g. a GPT-4-class or GPT-4o-class model. Always set this explicitly rather than relying on a default.temperature(0.0–2.0) — randomness of the output. Use0for deterministic, factual, or code tasks;0.7–1.0for creative writing.top_p(0.0–1.0) — nucleus sampling. An alternative to temperature; change one, not both.max_tokens(ormax_completion_tokens) — upper bound on the generated response length, not the input. The prompt plus completion must fit the model’s context window.frequency_penalty(−2.0 to 2.0) — positive values discourage repeating the same tokens, reducing verbatim repetition.presence_penalty(−2.0 to 2.0) — positive values push the model toward new topics it has not yet mentioned.stop— up to four strings that, when generated, halt the response.stream—trueto receive tokens incrementally as server-sent events.response_format— set to a JSON-schema object to force structured output.tools/tool_choice— declare functions the model may call and how aggressively to call them.
Minimal request examples
A bare curl request to Chat Completions:
curl https://api.openai.com/v1/chat/completions \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello"}],"temperature":0.7}'
The same call in Python with the official SDK:
from openai import OpenAI
client = OpenAI() # reads OPENAI_API_KEY from the environment
resp = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
temperature=0.7,
)
print(resp.choices[0].message.content)
Common error codes
- 400 — malformed request: bad JSON, an unknown parameter, or a context-length overflow.
- 401 — invalid or missing API key.
- 403 — your key lacks permission for that resource or region.
- 404 — wrong endpoint path or an unavailable model ID.
- 429 — rate limit hit, or quota/billing exhausted; back off and retry.
- 500 / 503 — server-side error or overload; retry with exponential backoff.
Treat 429 and 5xx as retryable with jittered exponential backoff, and treat 4xx (other than 429) as a bug in your request to fix rather than retry. Keep model IDs and rate limits in config so you can update them as OpenAI revises the lineup.