TTS SSML Markup Builder

Build SSML tags for pauses, emphasis, and prosody in AI voice generation

Ad placeholder (leaderboard)

TTS SSML markup builder

AI voice generators sound robotic when you feed them raw text. SSML (Speech Synthesis Markup Language) is the XML dialect that lets you control how the voice speaks — where it pauses, which words it stresses, and how fast, high, or loud it sounds. This builder turns your plain text into valid, copy-ready SSML for AWS Polly, Google Cloud TTS, and Azure Speech.

How it works

SSML wraps your text in a root <speak> element. Inside it you add tags:

<speak>
  <prosody rate="slow" pitch="+10%" volume="loud">
    Welcome to <emphasis level="strong">Gera</emphasis>.
    <break time="500ms"/> Let's begin.
  </prosody>
</speak>
  • <break time="500ms"/> inserts a pause. Use ms, s, or a strength keyword (weak, medium, strong).
  • <emphasis level="strong">word</emphasis> stresses a word.
  • <prosody rate pitch volume> controls speed, pitch, and loudness for the enclosed text.

The builder applies global prosody to your whole text, then lets you drop in break and emphasis tags so the output is always well-formed and balanced.

Tips for natural-sounding speech

  • Pause after clauses, not every word. A 300–500ms break after a comma and a 600–800ms break at a full stop reads naturally; more sounds halting.
  • Emphasis sparingly. Stressing one keyword per sentence lands; stressing three flattens the effect.
  • Subtle prosody wins. Rate slow/fast or pitch ±10% is enough — large shifts sound cartoonish.
  • Flag the input as SSML. Polly needs TextType=ssml, Google needs input.ssml, Azure expects the SSML request body. Otherwise the tags are spoken aloud.
Ad placeholder (rectangle)