Tokens-per-Second Speed vs Cost Calculator

Rank models by tokens/second per dollar for throughput-sensitive tasks

Ad placeholder (leaderboard)

Pick the right model when both speed and cost matter

For streaming chat, real-time agents, and large batch jobs, the cheapest model is not always the best choice — a model that is twice as fast for a similar price clears your queue sooner and frees infrastructure. This tool normalizes tokens-per-second and cost into one efficiency score you can tune with a single cost-sensitivity slider.

How the leaderboard is built

Across the model set, each model’s throughput and cost are scaled to 0-1. The blended score rewards speed and penalizes cost:

score = weight × norm(speed)
      + (1 − weight) × (1 − norm(cost))

At weight = 1 the ranking is pure speed; at weight = 0 it is pure cost; in between you get a balanced view. A latency threshold separately flags models that start responding too slowly for interactive use.

How to use the result

If you are building an interactive assistant, keep the cost-sensitivity weight high and watch the latency flag. For overnight batch jobs where users never wait, push the weight toward cost. Always validate the chosen model’s real throughput on your own prompts and region before standardizing on it.

Ad placeholder (rectangle)