How long can Stable Audio clips be?

Stable Audio can generate longer clips than most text-to-audio models — up to roughly three minutes depending on the model version. You set the target duration in the interface, and the prompt should describe an arrangement that fills that length.

Should I put the duration in the prompt or the duration field?

Use the dedicated duration control as the source of truth; the model honors it directly. Mentioning the length in the prompt as well ("looping 30-second bed") helps the model pace the arrangement so it does not feel cut off.

What makes a good Stable Audio music prompt?

Lead with genre and instrumentation, then add BPM, mood, and production descriptors like "clean mix" or "lo-fi tape saturation". Stable Audio responds well to comma-separated descriptive tags rather than full sentences.

Can Stable Audio make sound effects, not just music?

Yes. For SFX, drop musical terms entirely and describe the literal sound, environment, and intensity — for example "heavy rain on a tin roof, distant thunder, 20 seconds". Keep musical and sound-effect vocabulary separate for the cleanest output.

What is the Stable Audio Prompt Guide?

Guide to Stability AI's Stable Audio model. Covers prompt structure, duration specification, music versus sound-effect mode, and quality descriptors so your generated audio matches the length and style you intend. It runs free in your browser on Gera Tools, with nothing uploaded.

Stable Audio Prompt Guide

Name: Stable Audio Prompt Guide
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Stable Audio prompt guide

Stable Audio is Stability AI’s text-to-audio model, and it stands out for generating longer clips — full musical phrases and loops rather than just a few seconds. It reads comma-separated descriptive tags well, and it has a dedicated duration control that it honors precisely. The key to good output is matching your vocabulary to the mode you want: musical prompts and sound-effect prompts pull the model in very different directions.

How it works

For music, lead with genre and instrumentation, then layer in BPM, mood, and production descriptors. Stable Audio responds to tags like “ambient, warm pads, 70 BPM, lo-fi tape saturation, looping” more reliably than to prose. For sound effects, strip out all musical language and describe the literal sound, its environment, and its intensity. Set the duration field to the length you actually need; the model paces the arrangement to fill it rather than looping or trailing into silence.

Music prompt anatomy

A well-structured Stable Audio music prompt layers information from broadest to most specific:

[genre] + [instrumentation] + [tempo] + [mood] + [production] + [structure hint]

For example: ambient electronic, warm sustained pads, soft sub-bass, 72 BPM, introspective, lo-fi texture, gentle fade in, 60-second loop

Each layer does a different job:

Genre anchors the model’s style vocabulary.
Instrumentation specifies the sonic palette — real instruments, synths, or hybrid.
Tempo controls energy and feel; for ambient work 60–90 BPM, for club music 120–140 BPM.
Mood refines the emotional character — melancholic, euphoric, tense, serene.
Production controls texture and finish — “clean mix”, “lo-fi tape saturation”, “punchy transients”, “wide stereo”.
Structure hint helps with longer clips — “builds from minimal to full”, “looping”, “gentle intro”.

Sound effect prompt anatomy

For SFX, drop all musical vocabulary and focus on the literal acoustic event:

[sound source] + [action/motion] + [environment/reverb] + [intensity] + [duration]

For example: heavy rainfall on a corrugated metal roof, close microphone, no reverb, occasional distant thunder, 30 seconds

Common mistakes in SFX prompts:

Including musical terms like “atmospheric” or “rhythmic” pushes the model toward music.
Omitting the environment — “dog barking” gives different acoustics than “dog barking in a large empty warehouse”.
Not specifying intensity — “light rain” vs “torrential downpour” produce very different results.

Prompt examples by use case

Use case	Sample prompt
Podcast background bed	`soft lo-fi piano, light brushed drums, warm bass, 80 BPM, calm, 60 seconds, loopable`
Game ambient environment	`dark forest at night, crickets, distant owl, light wind through leaves, no music, 30 seconds`
Video game boss theme	`orchestral, brass stabs, driving strings, epic choir, 140 BPM, intense, full mix`
Relaxation app sound	`ocean waves on a pebbly beach, slow rhythm, fading distance, no music, 90 seconds`
Product launch video	`uplifting corporate pop, clean electric piano, light percussion, 120 BPM, optimistic, 30 seconds`

Tips for better results

Match the duration field to the prompt. Request a 30-second bed in both the prompt and the duration control so the pacing fits the length.
Use comma tags, not sentences. “deep house, rolling bassline, 124 BPM, punchy kick, club mix” beats a paragraph describing the same thing.
Add production descriptors. “clean mix”, “warm analog”, and “wide stereo” noticeably change the texture and polish of the output.
Separate music and SFX vocab. Mixing “cinematic strings” with “footsteps on gravel” in one prompt confuses the model — pick one intent per generation.