What the Gemini API offers
Google’s Gemini API gives you a family of multimodal models behind one endpoint: generateContent. The same request shape handles plain text, images, audio, video, and very long documents, and you choose between Flash (fast and cheap) and Pro (strongest reasoning, longest context). This guide walks through getting a key, building a request, and tuning the output.
How a request works
You send a POST to the model’s generateContent URL with your API key. The body
holds a contents array; each entry has a role and a list of parts, where a
part is either text or inline media. A generationConfig block controls
temperature, maxOutputTokens, and whether you want JSON back. Pick the model
in the URL — for example a Flash model for everyday work or a Pro model for the
hardest tasks.
The builder below lets you choose a model, write a prompt, set temperature and max output tokens, optionally request JSON output, and copy a ready-to-run curl, Node.js, or Python snippet.
Tips for going further
Use Flash by default and reserve Pro for genuinely hard or heavily multimodal
prompts to keep costs down. Lean on the large context window before reaching for
a vector database — for many documents you can simply paste the whole text. When
you need machine-readable output, set responseMimeType to application/json
and parse directly. And always check the usageMetadata field in the response to
monitor token consumption per call.