Token Cost by Language Model Matrix

Full cost matrix: every major model × your token volumes

Ad placeholder (leaderboard)

Token cost by language-model matrix

Enter your token profile once and instantly compare the cost across every major model — GPT-4o, the o-series, Claude Opus/Sonnet/Haiku, Gemini, and popular open-weight hosts. No more opening five pricing pages: the whole field lands in a single sortable matrix with the cheapest option highlighted.

How it works

Every model bills input and output tokens separately, quoted per million tokens. For each model the tool computes:

per_request = (input / 1e6) × input_price + (output / 1e6) × output_price
monthly     = per_request × requests_per_day × 30

It runs that for all models at once and sorts by monthly cost, so the cheapest model for your specific input/output ratio rises to the top. Output-heavy workloads favour different models than input-heavy ones, which is why doing this per-profile matters.

Tips

  • Output-heavy workloads (long generations) are punished hardest by models with expensive output tokens — sort the matrix and watch the order change as you raise the output count.
  • The cheapest model on price is your shortlist, not your decision. Run a quality test on your real prompts before committing.
  • For mixed workloads, pair this matrix with a routing strategy — send easy requests to a cheap model and hard ones to a premium model.
  • Re-check prices periodically; vendors cut (and occasionally raise) rates often.
Ad placeholder (rectangle)