Stable Audio Prompt Guide

Write effective Stable Audio prompts for music and sound effects

Ad placeholder (leaderboard)

Stable Audio prompt guide

Stable Audio is Stability AI’s text-to-audio model, and it stands out for generating longer clips — full musical phrases and loops rather than just a few seconds. It reads comma-separated descriptive tags well, and it has a dedicated duration control that it honors precisely. The key to good output is matching your vocabulary to the mode you want: musical prompts and sound-effect prompts pull the model in very different directions.

How it works

For music, lead with genre and instrumentation, then layer in BPM, mood, and production descriptors. Stable Audio responds to tags like “ambient, warm pads, 70 BPM, lo-fi tape saturation, looping” more reliably than to prose. For sound effects, strip out all musical language and describe the literal sound, its environment, and its intensity. Set the duration field to the length you actually need; the model paces the arrangement to fill it rather than looping or trailing into silence.

Tips for better results

  • Match the duration field to the prompt. Request a 30-second bed in both the prompt and the duration control so the pacing fits the length.
  • Use comma tags, not sentences. “deep house, rolling bassline, 124 BPM, punchy kick, club mix” beats a paragraph describing the same thing.
  • Add production descriptors. “clean mix”, “warm analog”, and “wide stereo” noticeably change the texture and polish of the output.
  • Separate music and SFX vocab. Mixing “cinematic strings” with “footsteps on gravel” in one prompt confuses the model — pick one intent per generation.
Ad placeholder (rectangle)