How is the monthly cost calculated?

For each model: monthly = (messages × input tokens ÷ 1,000,000 × input price) + (messages × output tokens ÷ 1,000,000 × output price). Input and output are priced separately because output is usually several times more expensive.

Are the prices accurate?

They reflect published list prices per million tokens and are labelled as estimates. Providers change pricing often and may offer volume discounts, so confirm the live rate in your provider dashboard before budgeting.

How do I estimate tokens if I only know word counts?

As a rough rule one English word is about 1.3 tokens, so multiply your average word count by 1.3. For exact counts use a model-specific tokenizer.

Is my data sent anywhere?

No. The entire calculation runs in your browser. Nothing you enter is uploaded, stored or logged.

What is the LLM Pricing Calculator?

Enter your expected monthly request volume and average input and output tokens, then compare projected monthly costs across OpenAI, Anthropic, Google, Mistral and Cohere models at current published rates — ranked cheapest first. It runs free in your browser on Gera Tools, with nothing uploaded.

LLM Pricing Calculator

Name: LLM Pricing Calculator
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Get one useful tool a week

Like this tool? Enter your email and we'll send you one genuinely useful Gera tool a week — plus a link to come back to this one. No spam, one-click unsubscribe any time.

Know your AI bill before you ship

Token pricing is easy to underestimate — a feature that feels cheap per request can cost thousands at scale. This calculator projects your monthly spend across ten leading models from one set of inputs, so you can pick the model that fits your budget instead of discovering the cost on your first invoice.

How to estimate your input and output tokens

If you are not sure how many tokens your requests use, start here:

One English word ≈ 1.3 tokens. A 100-word prompt is roughly 130 tokens.
A typical system prompt for a chat feature runs 200–600 tokens depending on detail. Include this in your input estimate.
Output tokens are the expensive side. Providers typically charge 3–5× more per output token than per input token, so even modest completions dominate the bill at volume.
For production, log a sample of real requests and measure actual token counts with a model-specific tokenizer. Estimates diverge from reality for multilingual text, code, and structured JSON.

How it works

You provide three numbers: monthly request volume, average input tokens, and average output tokens. For each model the calculator computes input and output cost separately — providers charge different rates for the prompt you send and the completion they generate — using published list prices per million tokens:

monthly = (requests × input_tokens  / 1e6 × input_price)
        + (requests × output_tokens / 1e6 × output_price)

The results are ranked cheapest first, with the lowest-cost model highlighted, so the cost spread between a small-flagship model and a premium reasoning model is immediately visible — often a 50-100x difference for the same workload.

Reading the comparison table

The ranking by total monthly cost is the key output. A few things to notice:

The gap between tiers is large. Frontier models often cost 10–50× more per token than their provider’s smaller models. If a task does not require frontier capability, that gap is pure savings.
Absolute differences grow with volume. At 10,000 requests/month a $0.50/M token difference is negligible. At 10 million requests it is a meaningful budget line.
Prices are list prices. Volume discounts, committed-use contracts, and cached-prompt pricing can significantly reduce the real rate; confirm with your provider for production budgeting.

Tips to cut your bill

Right-size the model. Most tasks do not need a premium model; GPT-4o mini or Gemini 1.5 Flash often deliver the result at a fraction of the cost.
Trim output tokens. Output is the expensive side. Ask for concise answers and cap max_tokens.
Cache and dedupe. Prompt caching and reusing results for repeated inputs cut input cost on high-volume pipelines.
Re-run the calculator whenever your token estimates change — small per-request differences compound fast at scale.