Token cost by language-model matrix
Enter your token profile once and instantly compare the cost across every major model — GPT-4o, the o-series, Claude Opus/Sonnet/Haiku, Gemini, and popular open-weight hosts. No more opening five pricing pages: the whole field lands in a single sortable matrix with the cheapest option highlighted.
How it works
Every model bills input and output tokens separately, quoted per million tokens. For each model the tool computes:
per_request = (input / 1e6) × input_price + (output / 1e6) × output_price
monthly = per_request × requests_per_day × 30
It runs that for all models at once and sorts by monthly cost, so the cheapest model for your specific input/output ratio rises to the top. Output-heavy workloads favour different models than input-heavy ones, which is why doing this per-profile matters.
Tips
- Output-heavy workloads (long generations) are punished hardest by models with expensive output tokens — sort the matrix and watch the order change as you raise the output count.
- The cheapest model on price is your shortlist, not your decision. Run a quality test on your real prompts before committing.
- For mixed workloads, pair this matrix with a routing strategy — send easy requests to a cheap model and hard ones to a premium model.
- Re-check prices periodically; vendors cut (and occasionally raise) rates often.