MusicGen prompt guide
MusicGen is Meta’s open text-to-music model (part of AudioCraft). It is text-conditioned: you describe the music in words and it generates audio. It is fast and runs locally, but it has two quirks worth planning around — a practical 30-second per-generation limit and several modes (plain generation, continuation, and melody conditioning) that each call for a slightly different prompt approach.
How it works
In generation mode, MusicGen reads a short descriptive phrase — genre, instrumentation, tempo, mood — and produces a clip up to roughly 30 seconds. For longer tracks you switch to continuation: you pass in an existing audio segment and the model extends it in the same style, which keeps a multi-minute piece coherent. The melody-conditioned variant additionally takes a reference melody and arranges your described style around that contour. The builder here writes the text-conditioning string and reminds you which mode fits your goal.
Tips for better MusicGen output
- Be concrete about instruments. “warm Rhodes piano, soft brushed drums, upright bass” gives a far more controlled result than “jazzy”.
- State tempo, not key. “90 BPM, relaxed” works; an explicit key signature does not, so describe mood instead.
- Chain with continuation for length. Don’t push a single call past 30 seconds — generate a base and continue from its tail to stay coherent.
- Use melody conditioning to arrange. If you already have a tune, the melody-conditioned model will dress it in your described genre.