MusicGen Prompt Guide

Write optimized prompts for Meta's MusicGen audio generation model

Ad placeholder (leaderboard)

MusicGen prompt guide

MusicGen is Meta’s open text-to-music model (part of AudioCraft). It is text-conditioned: you describe the music in words and it generates audio. It is fast and runs locally, but it has two quirks worth planning around — a practical 30-second per-generation limit and several modes (plain generation, continuation, and melody conditioning) that each call for a slightly different prompt approach.

How it works

In generation mode, MusicGen reads a short descriptive phrase — genre, instrumentation, tempo, mood — and produces a clip up to roughly 30 seconds. For longer tracks you switch to continuation: you pass in an existing audio segment and the model extends it in the same style, which keeps a multi-minute piece coherent. The melody-conditioned variant additionally takes a reference melody and arranges your described style around that contour. The builder here writes the text-conditioning string and reminds you which mode fits your goal.

Tips for better MusicGen output

  • Be concrete about instruments. “warm Rhodes piano, soft brushed drums, upright bass” gives a far more controlled result than “jazzy”.
  • State tempo, not key. “90 BPM, relaxed” works; an explicit key signature does not, so describe mood instead.
  • Chain with continuation for length. Don’t push a single call past 30 seconds — generate a base and continue from its tail to stay coherent.
  • Use melody conditioning to arrange. If you already have a tune, the melody-conditioned model will dress it in your described genre.
Ad placeholder (rectangle)