Getting Started with the Google Gemini API

Build your first multimodal AI app with Gemini

Ad placeholder (leaderboard)

What the Gemini API offers

Google’s Gemini API gives you a family of multimodal models behind one endpoint: generateContent. The same request shape handles plain text, images, audio, video, and very long documents, and you choose between Flash (fast and cheap) and Pro (strongest reasoning, longest context). This guide walks through getting a key, building a request, and tuning the output.

How a request works

You send a POST to the model’s generateContent URL with your API key. The body holds a contents array; each entry has a role and a list of parts, where a part is either text or inline media. A generationConfig block controls temperature, maxOutputTokens, and whether you want JSON back. Pick the model in the URL — for example a Flash model for everyday work or a Pro model for the hardest tasks.

The builder below lets you choose a model, write a prompt, set temperature and max output tokens, optionally request JSON output, and copy a ready-to-run curl, Node.js, or Python snippet.

Tips for going further

Use Flash by default and reserve Pro for genuinely hard or heavily multimodal prompts to keep costs down. Lean on the large context window before reaching for a vector database — for many documents you can simply paste the whole text. When you need machine-readable output, set responseMimeType to application/json and parse directly. And always check the usageMetadata field in the response to monitor token consumption per call.

Ad placeholder (rectangle)