AI Model Pricing Comparison: Cost per Million Tokens

How much does each AI model cost? Full price breakdown

Ad placeholder (leaderboard)

How AI pricing works

Almost all AI APIs charge by the token, and they charge two separate rates: one for input tokens (the prompt you send, including any system instructions and documents) and a higher one for output tokens (the text the model generates). A token is roughly four characters or three-quarters of an English word, so a 750-word answer is about 1,000 tokens. Output costs more because the model must run a full forward pass for every token it produces, while input is ingested more efficiently. The headline figures are usually quoted as dollars per million tokens, which sounds large but works out small per request — until you multiply by millions of requests, at which point the difference between models becomes a serious budget line.

Comparing the tiers

The market splits cleanly into flagship and small/fast tiers. Flagship models — the top GPT, Claude, and Gemini models — are the most capable and the most expensive, with output prices that are typically several times their input prices. Below them sit mini, flash, and haiku tiers: dramatically cheaper, faster, and good enough for a huge share of real work like classification, extraction, routing, and summarisation. Open-weight models such as Llama 3 and Mistral add a third option: if you self-host, the per-token cost is your own GPU and electricity bill rather than a per-call fee, which can be cheaper at very high volume but adds operational complexity. Because exact prices change frequently, always confirm current rates on each provider’s official pricing page before budgeting — treat any quoted number as indicative.

Worked cost examples

A concrete example clarifies the math. Suppose a support chatbot receives 500-token questions and returns 300-token answers. If a flagship model costs roughly several dollars per million input tokens and several times that for output, a single conversation costs a fraction of a cent — but at a million conversations a month, that becomes a meaningful five-figure bill. Swap to a small-tier model at a tenth of the price and the same volume drops by an order of magnitude. For document processing, the asymmetry flips: feeding in long documents (large input) to produce short summaries (small output) is comparatively cheap because the expensive output side is small. Knowing your input-to-output ratio is the single most useful thing for predicting cost.

Cutting the bill

Three levers reduce spend without much quality loss. First, right-size the model: route easy requests to a cheap tier and reserve the flagship for genuinely hard ones — a two-model cascade often cuts cost tenfold. Second, use prompt caching where offered, which discounts a reused prefix like a fixed system prompt or reference document by fifty to ninety percent on repeat calls. Third, trim the prompt: shorter system instructions and tighter context reduce input tokens on every single request. Combine these and a workload that looked expensive at flagship list price can become comfortably affordable. The discipline that pays off most is simply measuring your real token usage before assuming you need the most expensive option.

Ad placeholder (rectangle)