Which authentication header does the Claude API use?

The Messages API authenticates with an x-api-key header (not a Bearer token) plus a required anthropic-version header that pins the API version, for example 2023-06-01. Sending the key as a Bearer token will fail.

What is the difference between the system prompt and messages?

The top-level system field sets persistent instructions and persona for the whole conversation. The messages array holds the back-and-forth turns with user and assistant roles. Keeping instructions in system rather than a user turn makes them more reliable and cacheable.

How does prompt caching save money?

Adding cache_control to large, stable blocks like a long system prompt or document lets Anthropic reuse that prefix across requests at a steep discount, often around 90 percent cheaper for cached tokens. It also lowers latency because the model skips re-reading unchanged context.

Do I need the official SDK?

No. The API is plain HTTPS JSON, so you can call it with fetch or curl from any language. The official Python and TypeScript SDKs simply add typed helpers, streaming utilities, and retries on top of the same endpoint.

What does max_tokens control?

max_tokens caps how many tokens the model may generate in its response, and it is required. It does not limit your input. Set it high enough for the longest answer you expect, since the model stops abruptly when it hits the cap.

Getting Started with the Anthropic Claude API

What the Claude API gives you

The Anthropic API exposes Claude through a single primary endpoint — /v1/messages — that takes a list of conversation turns and returns the model’s next message. It is plain HTTPS with JSON, so you can call it from any language with no SDK at all. This guide covers the four things every beginner needs: authenticating, shaping a request, streaming responses, and controlling cost with prompt caching.

How a request is structured

Every call sends three headers — x-api-key with your secret key, anthropic-version pinning the API version, and content-type: application/json. The JSON body needs a model, a max_tokens cap, and a messages array where each item has a role of user or assistant and content. Persistent instructions go in the top-level system field rather than a user turn. Set stream: true to receive the answer as server-sent events instead of one blob.

Use the builder below to choose a model, write a system prompt and a user message, toggle streaming and caching, and copy a ready-to-run curl or JavaScript fetch snippet.

Tips for going further

Never ship your key in client-side code — proxy calls through your own backend. Turn on prompt caching for any large prefix you reuse, such as a long instruction set or a reference document, to cut both cost and latency dramatically. Watch the usage object in every response to track input and output tokens for billing. When you need structured output, ask for JSON in the system prompt and validate it on your side, or use tool definitions to force a typed shape.