Building SSML prosody tags
Text-to-speech engines read plain text in a flat, neutral voice by default.
SSML prosody tags let you shape the delivery — raising pitch for a question,
slowing the rate for emphasis, or lowering volume for an aside — without
changing a single word. This builder assembles a valid <prosody> tag from your
inputs and escapes the text so the markup never breaks.
How it works
You enter a text segment and set three controls: pitch, rate, and
volume. Each offers a preset mode (named values like high or x-slow) and
a relative mode — semitones for pitch (+2st), percent for rate (120%), and
dB for volume (+6dB). The tool wraps your escaped text in a <speak><prosody>
block with only the attributes you set, producing clean SSML compatible with AWS
Polly and Azure Speech.
Tips and notes
- Relative semitones are the most musical pitch control.
+2stshifts pitch predictably; the named presets are coarser steps. - Percent rate beats presets for fine pacing.
90%is a subtle slowdown thatslowwould overshoot. - Keep segments short. Apply prosody to the specific phrase that needs it rather than a whole paragraph, so the rest reads naturally.
- Test in your engine. Most prosody attributes are portable, but always preview in the actual TTS voice — engines interpret extremes differently.