What you are building
This guide gets you from zero to a working Mistral API integration. Mistral’s appeal is a blend of competitive quality, low per-token cost, and openly licensed models you can self-host — a strong default when flagship pricing would dominate your budget. The API mirrors the familiar OpenAI chat format, so if you have called any chat model before, the shape will feel immediately familiar; you mostly swap the base URL, key, and model name.
How the API works
You authenticate with an Authorization: Bearer <key> header and POST to the
chat completions endpoint with three things: a model name, a messages array
of { role, content } objects (system, user, assistant), and optional parameters
like temperature and max_tokens. The model returns a completion you read from
choices[0].message.content. From there, two features unlock production use.
JSON mode — response_format: { type: "json_object" } — guarantees the
output parses as valid JSON so you skip brittle text scraping. Function
calling — a tools array of JSON-schema function descriptions — lets the model
decide to call your code, returning a structured function name and arguments you
execute and feed back. Streaming works too: set stream: true and read tokens as
they arrive for a responsive UI.
Picking a tier and the planner below
The biggest lever is model choice, because tiers differ several-fold in price. Use a small model for classification, extraction, and high-volume simple chat; medium for balanced general assistants; large only for the hardest reasoning, code, and multi-step work. The discipline that saves the most money is to start small, measure quality on your real prompts, and step up only when the smaller model demonstrably misses. The planner below lets you compare cost across the tiers for your own token volume so you can see exactly what each rung of the ladder costs before you commit.