OpenAI API Cheatsheet: Parameters, Models, and Endpoints

The quick-reference guide every OpenAI API developer needs

Ad placeholder (leaderboard)

What this cheatsheet covers

The OpenAI API is a small set of HTTP endpoints with a handful of tuning parameters, but getting predictable results depends on knowing exactly what each parameter does. This reference covers the endpoints you will actually call, the sampling parameters that shape output, the current model families, and the error codes you will hit in production — with the intent that you can keep one tab open and stop guessing.

Core endpoints

All requests go to https://api.openai.com/v1/... and authenticate with a bearer token in the Authorization: Bearer $OPENAI_API_KEY header.

  • Chat CompletionsPOST /v1/chat/completions. The default for chat and most text generation. Takes a messages array of { role, content } objects, where role is system, user, assistant, or tool.
  • ResponsesPOST /v1/responses. The newer, higher-level API for agentic workflows, built-in tools, and stateful multi-step calls.
  • EmbeddingsPOST /v1/embeddings. Returns a vector for semantic search, clustering, and retrieval.
  • ImagesPOST /v1/images/generations. Text-to-image generation.
  • AudioPOST /v1/audio/transcriptions (speech to text) and POST /v1/audio/speech (text to speech).
  • ModelsGET /v1/models. Lists the model IDs available to your account.

Key generation parameters

These shape the output of a chat or text request:

  • model — the model ID string, e.g. a GPT-4-class or GPT-4o-class model. Always set this explicitly rather than relying on a default.
  • temperature (0.0–2.0) — randomness of the output. Use 0 for deterministic, factual, or code tasks; 0.7–1.0 for creative writing.
  • top_p (0.0–1.0) — nucleus sampling. An alternative to temperature; change one, not both.
  • max_tokens (or max_completion_tokens) — upper bound on the generated response length, not the input. The prompt plus completion must fit the model’s context window.
  • frequency_penalty (−2.0 to 2.0) — positive values discourage repeating the same tokens, reducing verbatim repetition.
  • presence_penalty (−2.0 to 2.0) — positive values push the model toward new topics it has not yet mentioned.
  • stop — up to four strings that, when generated, halt the response.
  • streamtrue to receive tokens incrementally as server-sent events.
  • response_format — set to a JSON-schema object to force structured output.
  • tools / tool_choice — declare functions the model may call and how aggressively to call them.

Minimal request examples

A bare curl request to Chat Completions:

curl https://api.openai.com/v1/chat/completions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello"}],"temperature":0.7}'

The same call in Python with the official SDK:

from openai import OpenAI
client = OpenAI()  # reads OPENAI_API_KEY from the environment
resp = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    temperature=0.7,
)
print(resp.choices[0].message.content)

Common error codes

  • 400 — malformed request: bad JSON, an unknown parameter, or a context-length overflow.
  • 401 — invalid or missing API key.
  • 403 — your key lacks permission for that resource or region.
  • 404 — wrong endpoint path or an unavailable model ID.
  • 429 — rate limit hit, or quota/billing exhausted; back off and retry.
  • 500 / 503 — server-side error or overload; retry with exponential backoff.

Treat 429 and 5xx as retryable with jittered exponential backoff, and treat 4xx (other than 429) as a bug in your request to fix rather than retry. Keep model IDs and rate limits in config so you can update them as OpenAI revises the lineup.

Ad placeholder (rectangle)