How does the Gemini API authenticate requests?

The REST API takes your key as a query parameter, key=YOUR_KEY, on the generateContent URL, or as an x-goog-api-key header. The official client SDKs read the key from an environment variable. Always keep the key server-side.

When should I use Flash versus Pro?

Flash models are optimized for speed and low cost and handle the majority of chat, summarization, and extraction tasks well. Pro models cost more but reason better on complex, multi-step, or highly multimodal problems. Start with Flash and upgrade only where quality demands it.

Is Gemini really multimodal?

Yes. A single generateContent request can mix text with images, audio, video, and PDFs by adding parts to the contents array, using inline base64 data or a file reference uploaded through the Files API for larger media.

What does the long context window enable?

Gemini models offer very large context windows, letting you pass whole books, large codebases, or hours of transcript in one request. This reduces the need for chunking in many retrieval workflows, though cost still scales with input tokens.

How do I get structured JSON back?

Set responseMimeType to application/json in generationConfig and optionally supply a responseSchema describing the shape you want. The model then returns valid JSON you can parse directly instead of free-form text you have to clean up.

Getting Started with the Google Gemini API

What the Gemini API offers

Google’s Gemini API gives you a family of multimodal models behind one endpoint: generateContent. The same request shape handles plain text, images, audio, video, and very long documents, and you choose between Flash (fast and cheap) and Pro (strongest reasoning, longest context). This guide walks through getting a key, building a request, and tuning the output.

How a request works

You send a POST to the model’s generateContent URL with your API key. The body holds a contents array; each entry has a role and a list of parts, where a part is either text or inline media. A generationConfig block controls temperature, maxOutputTokens, and whether you want JSON back. Pick the model in the URL — for example a Flash model for everyday work or a Pro model for the hardest tasks.

The builder below lets you choose a model, write a prompt, set temperature and max output tokens, optionally request JSON output, and copy a ready-to-run curl, Node.js, or Python snippet.

Tips for going further

Use Flash by default and reserve Pro for genuinely hard or heavily multimodal prompts to keep costs down. Lean on the large context window before reaching for a vector database — for many documents you can simply paste the whole text. When you need machine-readable output, set responseMimeType to application/json and parse directly. And always check the usageMetadata field in the response to monitor token consumption per call.