Why refine an image prompt?

Image models respond strongly to explicit cues about style, lighting, lens, composition, and mood. A vague prompt yields generic output; a refined one with concrete descriptors produces sharper, more intentional images.

Where does my API key go?

It stays in your browser tab and is sent directly to OpenAI or Anthropic with the request you trigger. It is never stored, logged, or routed through any Gera server, and refreshing the tab clears it.

How are the variants tailored to each generator?

The tool tells the model which generator you are targeting so it can match conventions — Midjourney parameter flags like --ar and --style, Stable Diffusion comma-separated tag style, or DALL-E natural-language descriptions.

Does this generate images?

No. It only produces refined text prompts. You then paste those prompts into your image generator of choice. This keeps the tool free of image-generation costs and provider lock-in.

Who pays for the API calls?

You do, on your own provider account. Each refine is one real API call billed at your usage rate. The tool itself is free — your only cost is the tokens consumed on your key.

What is the Image Prompt Refiner (BYO-key)?

Use your own OpenAI or Anthropic API key to analyze an image-generation prompt and rewrite it with stronger style, lighting, composition, and detail cues. Returns three enhanced variants tuned for your target model. Your key stays in your browser. It runs free in your browser on Gera Tools, with nothing uploaded.

Image Prompt Refiner (BYO-key)

Name: Image Prompt Refiner (BYO-key)
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Get one useful tool a week

Like this tool? Enter your email and we'll send you one genuinely useful Gera tool a week — plus a link to come back to this one. No spam, one-click unsubscribe any time.

Image models reward specificity. “A cat” gives you a generic cat; “a ginger tabby on a rain-streaked windowsill, soft overcast light, 35mm, shallow depth of field, melancholic mood” gives you a photograph. This tool takes your rough idea and, using your own OpenAI or Anthropic key, rewrites it into three polished prompt variants tuned for your target generator.

How it works

Choose a provider and model, paste your API key, type your rough prompt, and select the target generator — Midjourney, DALL-E, or Stable Diffusion. The tool sends one direct request from your browser asking the model to enrich your prompt with concrete cues for style, lighting, composition, lens, and mood, and to format the output for the chosen generator (parameter flags for Midjourney, tag lists for Stable Diffusion, natural language for DALL-E). It returns three distinct variants so you can pick a direction.

Your key never reaches a Gera server — it is held only in the tab and sent straight to the provider (with the official direct-browser-access header for Anthropic). Refreshing clears it.

What gets added to each dimension

Style layers in the medium (photography, oil painting, digital illustration, watercolour), the period or art movement (Art Nouveau, brutalist, hyperrealistic), and optionally a reference aesthetic (cinematic, editorial, architectural render). A generic “futuristic city” becomes “a neo-brutalist skyline rendered as a Ridley Scott-era science fiction film still.”

Lighting and mood add the light source direction (side-lit, back-lit, rim-lit), quality (soft diffused, harsh directional, golden-hour warmth), colour temperature (cool blue, warm amber), and overall atmosphere. Lighting is often the single biggest determinant of whether an image feels professional or flat.

Composition and lens specify the camera angle (low angle, overhead, eye-level), framing convention (rule of thirds, centred, negative space), and for photography prompts, the equivalent focal length and aperture (35mm at f/1.8 for shallow depth of field; 85mm portrait; wide 16mm for environmental context). These cues travel well to Midjourney’s --ar parameter for aspect ratio.

Generator-specific formatting

Each generator has conventions the refiner follows:

Midjourney: Natural language description followed by parameter flags (--ar 16:9 --style raw --v 6). The refiner outputs prompts in this format.
Stable Diffusion: Comma-separated weighted tags ((dramatic lighting:1.3), bokeh, hyperdetailed, 8k). The refiner formats output as a weighted tag list.
DALL-E: Full natural-language sentences without special syntax, since DALL-E 3 follows instruction-style prompts more reliably than tag lists.

Tips

Keep your input focused on the subject; let the refiner add the descriptive scaffolding around it.
Generate three variants, render all three, then pick the best phrasing to combine into a final prompt iteration.
Cheaper models (gpt-4o-mini, claude-haiku) handle prompt rewriting accurately and keep the API cost per refinement negligible.
After you find a winning prompt formula, save it as a template and swap out just the subject for future images.