Compare voice API costs across providers
Adding speech to an app means paying for text-to-speech (TTS), speech-to-text (STT), or both — and prices vary widely between OpenAI, ElevenLabs, Google and Deepgram. This calculator takes your minutes of audio per day, a direction, and a quality tier, and shows the daily and monthly cost for each provider so you can choose on real numbers instead of marketing pages.
How it works
Costs are normalised to a per-minute rate. STT providers already bill per minute of input audio. For TTS, character-based pricing is converted using an average of roughly 150 spoken words (about 900 characters) per minute. The daily cost is then:
daily_cost = minutes_per_day × per_minute_rate × tier_multiplier
When you select “both”, the tool sums the TTS and STT rates for each provider. Premium tiers apply a multiplier to reflect higher-accuracy STT models and more natural TTS voices.
Tips for managing voice spend
- Cache repeated TTS. Greetings, menu prompts and canned responses should be synthesised once and stored, not regenerated on every call.
- Match the tier to the task. Use premium voices for customer-facing audio and standard tiers for internal transcription where accuracy tolerance is higher.
- Trim silence before STT. Voice-activity detection removes dead air so you are not billed for minutes of silence.
- Batch transcription. Offline batch STT is frequently cheaper than real-time streaming for recordings that do not need an instant transcript.