Why does the cheapest model change with my token ratio?

Providers price output tokens several times higher than input. A long-prompt, short-answer task favors models with cheap input, while a short-prompt, long-answer task favors cheap output. Changing the ratio re-orders the heatmap.

Are the prices current?

The model prices are editable presets based on published list prices and labelled as estimates. Providers change pricing often, so confirm the current rate in your provider dashboard before budgeting.

What do the colors mean?

Color encodes relative cost across the models shown — green is the cheapest, red the most expensive, with a smooth gradient between. It is relative to the current list, so filtering by tier rescales the colors.

Is my data sent anywhere?

No. The heatmap is computed entirely in your browser. Nothing you enter is uploaded, stored or logged.

What is the Token Cost Heatmap by Model?

Free LLM token cost heatmap. Enter your input and output token counts and quality tier, and see 30+ models ranked with cost encoded as color intensity, making the cheapest model for your specific prompt-to-completion ratio visually obvious. It runs free in your browser on Gera Tools, with nothing uploaded.

Token Cost Heatmap by Model

Name: Token Cost Heatmap by Model
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Get one useful tool a week

Like this tool? Enter your email and we'll send you one genuinely useful Gera tool a week — plus a link to come back to this one. No spam, one-click unsubscribe any time.

Token cost heatmap by model

Picking the cheapest model is not just “smallest number wins” — it depends on your prompt-to-completion ratio, because providers price output tokens far higher than input. This heatmap costs your specific request across 30+ models and colors each row by price, so the best-value choice is obvious at a glance.

How it works

For every model the cost of one request is:

cost = (input_tokens / 1,000,000) × input_price
     + (output_tokens / 1,000,000) × output_price

The tool computes this for each model at your token counts, sorts cheapest to most expensive, and maps cost onto a green-to-red color scale relative to the models shown. A quality-tier filter lets you compare flagship, mid-range and fast models on equal footing instead of mixing a frontier model with a budget one.

Why the prompt-to-completion ratio determines which model wins

Output tokens are consistently priced higher than input tokens — often 3× to 5× or more depending on the provider. This means the cheapest model for a task with a large input and small output is not necessarily the cheapest for a task with a small input and large output.

Consider two tasks with different token ratios:

Task A — Document classification (long input, tiny output):

2,000 input tokens, 20 output tokens
For this task, input price dominates completely. A model with cheap input wins.

Task B — Creative generation (short prompt, long output):

100 input tokens, 2,000 output tokens
Here output price dominates. A model with cheap output wins even if its input price is slightly higher.

Changing the ratio in the heatmap re-sorts the entire ranking. A model that appears green for Task A can appear red for Task B. This is why the heatmap is more useful than a simple “cheapest model” list.

Reading the color scale

The color gradient (green to red) is relative to the models currently shown in the filtered list. When you filter to just “fast” models, the scale recalibrates across that group — so green is the cheapest fast model, not the cheapest model overall. This makes it easy to compare like with like within a quality tier.

Switching between quality tiers typically shows a 5–30× cost difference between the fast and frontier groups, which is why the tier selection matters before making a deployment decision.

Tips for choosing a model

Match the ratio to the task. Summarising a long document (big input, small output) rewards models with cheap input; brainstorming or drafting (small input, big output) rewards cheap output.
Start in the fast tier. For classification, extraction and routing, a fast/mini model is often 10–20× cheaper and good enough for most accuracy requirements.
Reserve frontier models. Use them where reasoning quality clearly moves the needle, not as a default for every task.
Re-run when prices change. Edit the presets to your current contracted rates for an accurate ranking — provider pricing changes frequently and volume discounts shift the ordering.