Multilingual token estimator
If you serve users in multiple languages, your per-request cost is not constant — the same message can cost two to four times more in Japanese or Hindi than in English, purely because of how the tokenizer splits the text. This estimator shows that gap across more than thirty languages so you can budget per market and decide where caching or model choice matters most.
How it works
Modern tokenizers use byte-pair encoding trained largely on English, so English is the most token-efficient language and others carry a multiplier. The tool takes your text, estimates its English-equivalent token count, and then applies each selected language’s empirical token-density multiplier to project how the same meaning would tokenize in that language. It ranks the results so the most expensive languages are obvious. Everything runs in your browser.
Tips and notes
Use this before pricing a multilingual product: a flat per-request price that works in English can be loss-making in a script-heavy language. The biggest multipliers come from non-Latin scripts and agglutinative languages, so markets like Japan, India, Thailand, and the Arab world deserve extra budget headroom. If a few languages dominate your cost, consider a more multilingual-friendly model or language-specific prompt trimming. These are planning multipliers — for exact counts on production strings, run real translations through the provider’s own tokenizer.