TTS emotion and tone prompt guide
Default AI voice output is competent but flat. The difference between a robotic read and a believable performance is direction — the same techniques a voice actor gets from a script. Expressive TTS models respond to emotional cues, interjections, and pacing markup, but each engine wants them in a different form. This guide turns the emotion you want into the cues your specific platform understands.
How it works
Pick a target emotion and your TTS platform. Each emotion maps to a set
of reliable techniques: parenthetical direction such as (warmly) or
(laughing), interjection words like “oh” and “hmm” that force a natural
breath, and punctuation pacing — ellipses for hesitation, em dashes for
abrupt stops, exclamation for energy. The tool then formats an example line for
your engine: inline cues for ElevenLabs, a tone instruction for OpenAI TTS, or
SSML <prosody> and <emphasis> tags for generic engines.
Tips for expressive delivery
- Layer cues with punctuation. A
(somber)tag plus ellipses and shorter sentences reads as genuine sadness; the tag alone often is not enough. - Keep emotional spans short. Models hold an emotion better over a sentence or two than across a long paragraph — break up monologues.
- Test before committing. Some engines speak the cue text aloud. Generate a few seconds first and fall back to pacing-only techniques if a cue leaks.
- Match energy to content. Forcing high energy onto somber copy sounds uncanny; let the emotion follow the words.