How is the cost calculated?

Cost per request is (prompt tokens divided by one million times the input price) plus (completion tokens divided by one million times the output price). The tool multiplies that by your daily request count for the daily figure and by roughly thirty for the monthly figure. Output tokens cost more, so longer completions dominate the bill.

Are the prices accurate?

They are realistic published estimates at the time of the last update, but providers change pricing often. Treat the numbers as a planning guide and confirm current rates on each provider's official pricing page before committing to a budget.

How do I estimate my token counts?

A rough rule of thumb for English text is about 0.75 words per token, or roughly four characters per token. For a more precise count, run sample prompts and completions through a token counter. Use realistic averages rather than best-case figures so your estimate is not too optimistic.

Why is the model comparison useful?

The same workload can cost ten times more on a frontier model than on a small one. Seeing every model side by side lets you find the cheapest option that still meets your quality bar, and quantify the savings of routing easy requests to a smaller model.

Does prompt caching change these numbers?

Yes, significantly. Many providers offer large discounts on repeated prompt prefixes via caching. This estimator assumes uncached pricing, so your real bill can be lower if a stable system prompt or context is cached. Treat the output as an upper-bound estimate.

LLM API Cost Estimator

Before you ship an LLM feature, you need to know what it will cost at scale. This estimator turns daily volume and average token sizes into per-request, daily and monthly cost — and compares every major model side by side so you can pick the cheapest one that meets your quality bar.

How it works

Cost per request is computed from each model’s published per-million-token rates:

cost = (prompt_tokens / 1M × input_price) + (completion_tokens / 1M × output_price)

That is multiplied by your daily request count for the daily figure and by ~30 for the monthly figure. Output tokens are priced several times higher than input tokens on most models, so verbose completions dominate the bill. All math runs locally in your browser.

Tips

Use realistic average token counts, not best-case ones — about four characters per token for English text. The model comparison is the highest-leverage part: the same workload can cost 10× more on a frontier model than a small one, so routing easy requests to a cheaper model is often the biggest saving available. Note these figures assume uncached pricing; if you reuse a stable system prompt, prompt caching can cut input costs substantially, making this an upper-bound estimate.