How is the cost per request calculated?

Cost = (input tokens ÷ 1,000,000 × input price) + (output tokens ÷ 1,000,000 × output price). Providers price input and output tokens separately, and output is usually several times more expensive.

Are the prices up to date?

The model prices are editable presets based on published list prices and are clearly labelled with their verification date. Providers change pricing, so confirm the current rate in your provider's dashboard before budgeting.

Is my data sent anywhere?

No. The calculator runs entirely in your browser. Nothing you enter is uploaded, stored or logged.

How do I convert words to tokens?

As a rough rule, one English word is about 1.3 tokens and 100 tokens is about 75 words. For an exact count, use a model-specific token counter — never assume one provider's tokenizer matches another's.

What is the LLM API Cost Calculator?

Free LLM API cost calculator. Pick a model (GPT, Claude, Gemini and more), enter input and output tokens and your monthly request volume, and see the cost per call and projected monthly spend instantly — all in your browser. It runs free in your browser on Gera Tools, with nothing uploaded.

LLM API Cost Calculator

Name: LLM API Cost Calculator
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

The line item that surprises new LLM builders is rarely the model choice — it is the input/output price asymmetry. Providers bill the two directions separately, and generated (output) tokens typically cost 3-5× what prompt (input) tokens cost. A chat feature that streams long answers and a summarizer that reads long documents can use identical total tokens and produce completely different bills. This calculator does the arithmetic for one request and projects it across your monthly volume.

The billing formula every provider shares

cost per request =
    (input_tokens  ÷ 1,000,000) × input_price_per_1M
  + (output_tokens ÷ 1,000,000) × output_price_per_1M

monthly spend = cost per request × requests per month

A token is roughly ¾ of an English word (1,000 tokens ≈ 750 words); code and non-English text tokenize less efficiently. Tokenizers differ between providers and between model generations of the same provider, so always count with the model-specific counter when precision matters.

Worked reference: current Claude API pricing

As one concrete, verifiable anchor, Anthropic’s published per-million-token list prices (verified June 2026 against the official pricing page — always recheck before budgeting):

Model	Input $/1M	Output $/1M
Claude Opus 4.8	$5.00	$25.00
Claude Sonnet 5	$3.00 (intro $2.00 to 2026-08-31)	$15.00 (intro $10.00)
Claude Haiku 4.5	$1.00	$5.00

Notice the two structural facts this table demonstrates: output costs exactly 5× input at every tier, and each tier down is roughly 3-5× cheaper than the one above it. OpenAI and Google publish comparable per-million-token tables (openai.com/api/pricing, ai.google.dev/pricing) with the same shape — flagship, mid-tier, and small models, output priced above input.

Worked examples with the formula

Short chat turn on a mid-tier model at $3.00/$15.00: 500 input + 300 output tokens = (0.0005 × $3.00) + (0.0003 × $15.00) = $0.006 per request — $600/month at 100,000 requests.

Document summarization on the same model: 8,000 input + 500 output = (0.008 × $3.00) + (0.0005 × $15.00) = $0.0315 per request. Input is 76% of the cost here — this is the workload where prompt caching pays most.

Same summarizer on a small model at $1.00/$5.00: $0.0105 per request — a 3× saving for a task small models usually handle well.

The two discounts most teams leave on the table

Batch processing — 50% off everything. Anthropic’s Message Batches API and OpenAI’s Batch API both process asynchronous jobs (usually within hours, up to 24h) at half price on all tokens. Any workload that doesn’t need a real-time answer — nightly classification, embedding pipelines, bulk document processing — should be batched by default.

Prompt caching — up to ~90% off repeated input. If your requests share a large stable prefix (a long system prompt, a document being asked multiple questions), cached input re-reads bill at roughly 0.1× the input price (Anthropic; writes cost ~1.25×, so caching pays for itself from the second request within the cache lifetime). For the long-document example above, caching the 8,000-token document turns $24/1,000-requests of input cost into about $2.40 after the first request.

Choosing the model tier — the biggest lever of all

Small/fast models are 5-25× cheaper per token than flagships and are adequate for a surprising share of production traffic:

Classification, routing, intent detection, extraction — small models with clear instructions are usually sufficient.
Nuanced reasoning, complex multi-step instructions, hard code — flagship models earn the premium.
The router pattern: a cheap model handles the easy majority and escalates flagged-hard cases to the expensive model. Even a crude confidence heuristic often cuts blended cost 60-80%.

Why output tokens cost more

The asymmetry is physics, not marketing. Input processing (“prefill”) runs in parallel across all prompt tokens at once — efficient on GPUs. Output generation (“decode”) produces one token at a time, each step reading the full model state, which is memory-bandwidth-bound and throughput-limited. Serving one output token genuinely costs more than ingesting one input token, and pricing follows.

Practical cost-control checklist

Trim the system prompt — it is billed on every call; at 100k requests/month, each 1,000 prompt tokens costs real money before caching.
Cap max_tokens so a runaway generation cannot 10× a request.
Track cost per feature, not per account. One template change that doubles input length doubles that feature’s bill silently.
Watch reasoning/thinking tokens — reasoning-mode models bill their internal thinking as output; a “short answer” can carry thousands of billed thinking tokens.
Re-verify prices quarterly. Providers reprice often (usually downward, sometimes with introductory windows like the Sonnet 5 pricing above), and stale assumptions cut both ways.

Sources

Preset prices in the tool are editable and labelled with their verification date; treat your provider dashboard as the source of truth. All calculation runs locally in your browser — nothing you enter is uploaded.