Question 1

Why are output tokens usually more expensive than input tokens?

Accepted Answer

Generating a token requires a full forward pass through the model for each one produced, whereas input tokens are processed together more efficiently during the initial prompt ingestion. Output is the computationally expensive part, so providers price it higher — often three to five times the input rate. This means a chatbot that produces long replies costs far more than one that reads long documents and answers briefly.

Question 2

How do I estimate the cost of a request before sending it?

Accepted Answer

Multiply your expected input tokens by the input price per million, multiply expected output tokens by the output price per million, and add the two. Roughly one token is about four characters or three-quarters of a word in English, so a 1,000-word prompt is about 1,300 tokens. Use a tokenizer tool for accuracy, especially for code or non-English text where token counts run higher.

Question 3

What is prompt caching and how does it cut cost?

Accepted Answer

Several providers let you cache a large, reused prefix — like a long system prompt or document — so subsequent requests that reuse it are billed at a steep discount, often fifty to ninety percent off the cached portion. For applications that send the same context repeatedly, such as a support bot with fixed instructions, caching can dramatically reduce the per-request bill. Check each provider's caching terms, as discounts and minimum sizes differ.

Question 4

Are cheaper small models good enough to save money?

Accepted Answer

Often, yes. Mini and flash tier models cost a fraction of flagship models and handle classification, extraction, summarisation, and routing well. A common pattern is to route easy requests to a cheap model and escalate only hard ones to a flagship — cutting cost by an order of magnitude with little quality loss. Always test the cheaper model on your actual tasks before assuming the flagship is necessary.

AI Model Pricing Comparison: Cost per Million Tokens

How AI pricing works

Comparing the tiers

Worked cost examples

Cutting the bill