Why separate input and output tokens per dollar?

Providers price input and output differently — output is usually three to five times more expensive. A model that is cheap for input can be expensive for output, so the most efficient model depends on whether your workload reads more than it writes.

What does input-heavy versus output-heavy mean?

Input-heavy workloads send large prompts and get short answers — summarisation, classification, retrieval. Output-heavy workloads send short prompts and generate long answers — content generation, code. Balanced is roughly equal. The leaderboard weights the blended price accordingly.

Does cheapest tokens per dollar mean best model?

No. Value is tokens per dollar at the quality you need. A budget model with huge tokens-per-dollar is useless if it fails your task. Use the quality filter to compare only models that clear your bar, then pick the most efficient one.

Are the prices current?

They are editable presets from published list prices and labelled as estimates. Providers change pricing often, so confirm the current rate before budgeting.

Is anything sent to a server?

No. The ranking is computed in your browser from a built-in price table. Nothing is uploaded.

What is the Tokens-per-Dollar Leaderboard?

Ranks major LLMs by how many tokens you get per US dollar, for input-heavy, output-heavy, or balanced workloads, making the most cost-efficient model for your use case obvious at a glance. Runs in your browser. It runs free in your browser on Gera Tools, with nothing uploaded.

Tokens-per-Dollar Leaderboard

Name: Tokens-per-Dollar Leaderboard
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Get one useful tool a week

Like this tool? Enter your email and we'll send you one genuinely useful Gera tool a week — plus a link to come back to this one. No spam, one-click unsubscribe any time.

Get the most tokens for every dollar

Two models can differ tenfold in how many tokens a dollar buys. This leaderboard ranks major LLMs by tokens per dollar, weighted for how you actually use them — input-heavy, output-heavy, or balanced — so the best-value model for your workload is the one at the top.

How tokens-per-dollar is computed

For a given workload mix we blend the input and output prices, then invert to get tokens per dollar:

blended_price = w_in x input_price + w_out x output_price   (per 1M tokens)
tokens_per_dollar = 1,000,000 / blended_price

For input-heavy work the input weight dominates; for output-heavy work the more expensive output price dominates, which is why output-heavy leaderboards look very different from input-heavy ones.

Why input and output prices differ so much

Most LLM APIs charge output tokens at a significantly higher rate than input tokens — often three to five times more. The reason is computational: generating each output token requires an autoregressive forward pass through the full model, while input tokens are processed in a single parallel pass. This means a “balanced” workload effectively costs more than a pure input-only workload at the same total token count, and the model that looks cheapest for reading large documents may not be cheapest for writing long ones.

Workload types explained

Input-heavy: Sending large documents, codebases, or conversation histories with short responses. Examples: summarization, classification, data extraction, RAG retrieval augmentation where the context is large and the answer is short.

Output-heavy: Short prompts, long responses. Examples: creative writing, code generation, detailed explanations, report drafting. Output-heavy workloads are where per-token cost differences matter most because you are generating the expensive tokens far more.

Balanced: Conversational back-and-forth or medium-length responses to medium-length prompts. Chat assistants and general-purpose tools typically fall here.

Worked illustration

For example, suppose Model A charges $0.50 per million input tokens and $1.50 per million output tokens. Model B charges $1.00 per million input and $1.00 per million output.

Input-heavy (80% input): Model A blended ≈ $0.70/M, Model B ≈ $1.00/M — Model A wins
Output-heavy (80% output): Model A blended ≈ $1.30/M, Model B ≈ $1.00/M — Model B wins
The leaderboard does this calculation for you across all tracked models simultaneously.

Tips for using the ranking

Match the workload weighting to reality. Ranking a content-generation app on input-heavy weights will recommend the wrong model.
Filter by quality first. Set the tier so you only compare models that can do the job, then maximise tokens per dollar within that set.
Re-check prices regularly. Provider price cuts reshuffle the leaderboard; a model that was mid-pack last quarter may now top it.
Account for context caching. Several providers offer reduced rates on cached (repeated) input. If your system prompt is large and fixed, caching can substantially change the effective leaderboard position.