How is the cost calculated?

Cost per request is prompt tokens times the model's input price plus completion tokens times its output price, both priced per million tokens. The monthly projection multiplies that by your requests per day and 30 days.

Why do input and output have different prices?

Providers charge more for output (generated) tokens than input (prompt) tokens because generation is more compute-intensive. That is why output-heavy workloads like long-form writing cost far more than input-heavy ones like classification.

Are these prices current?

They are published list rates in USD per million tokens at the time of the last update, but providers change pricing regularly. Always confirm against each provider's official pricing page before committing to a budget.

Does this account for caching or batch discounts?

No. The calculator uses standard list pricing, so it does not include prompt-caching discounts, batch-API discounts, or volume agreements. Those can reduce real costs substantially, especially for repetitive prompts.

What is the LLM Model Pricing Calculator?

Enter prompt and completion token estimates and requests per day to see per-request cost and a 30-day projection for every current model across OpenAI, Anthropic, Google Gemini, and Mistral in one sortable table. It runs free in your browser on Gera Tools, with nothing uploaded.

LLM Model Pricing Calculator

Name: LLM Model Pricing Calculator
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Compare LLM costs across every major provider at once

Picking a model on price alone is hard when each provider lists input and output rates separately and the numbers are per million tokens. This calculator collapses all of that into one comparison: enter your typical token counts and request volume, and see the real per-request cost and monthly spend for every current model side by side.

How it works

You provide three numbers — average input tokens, average output tokens, and requests per day. For each model the tool computes cost per request as (input ÷ 1M × input_price) + (output ÷ 1M × output_price), then projects monthly spend by multiplying by your daily volume and 30 days. The table covers OpenAI, Anthropic, Google Gemini, and Mistral, and you can sort by per-request cost, monthly projection, or name to find the right fit. Everything runs client-side.

Tips and notes

Output tokens dominate cost for generation-heavy tasks, so estimate them carefully — a chat assistant that writes long replies can cost several times more than the prompt suggests. The cheapest model is highlighted, but balance price against quality: a mini/flash/haiku tier model is often the right default, with a frontier model reserved for hard requests. These are list prices; prompt caching and batch APIs can cut real spend further, so treat the monthly figure as an upper bound and confirm rates on each provider’s pricing page before budgeting.

Understanding the input vs. output price split

Every major LLM provider charges separately for input tokens (your prompt) and output tokens (the model’s response), and output is always more expensive — typically 3 to 5 times the input rate. This is because generating output tokens requires a full forward pass through the network for each token, while processing input tokens can be partially parallelised.

This pricing structure has a concrete effect on workload costs:

Extraction and classification tasks (short output, long input) are relatively cheap because most tokens are input.
Long-form generation tasks (summaries, drafts, code generation with long outputs) are proportionally more expensive because output tokens dominate.
Chat with short exchanges falls somewhere in between, but system prompts — which are input tokens sent on every call — add up across a large volume of requests.

Worked example

For a task averaging 1,000 input tokens and 500 output tokens at 1,000 requests per day, and using a model priced at $1.00 per million input tokens and $3.00 per million output tokens:

Input cost per request: 1,000 ÷ 1,000,000 × $1.00 = $0.001
Output cost per request: 500 ÷ 1,000,000 × $3.00 = $0.0015
Total per request: $0.0025
Monthly (1,000 requests/day × 30 days): $75.00

The same workload on a model priced at $0.15 / $0.60 (a smaller tier) costs about $10.50 per month. The comparison table in this tool makes that gap immediately visible without doing the arithmetic by hand.

What the monthly figure does not include

The calculator uses list pricing and does not account for prompt caching (which can reduce effective input cost when system prompts are repeated), batch API discounts (typically 50% off for non-real-time workloads), or volume commitments. For high-volume production use, the real cost after discounts is often meaningfully lower than the figure shown here — treat the output as a planning ceiling, then negotiate with your provider.