Prompt Prefill Token Cost Calculator

Calculate cost savings from Anthropic-style assistant response prefilling

Ad placeholder (leaderboard)

Prompt prefill token cost calculator

Output tokens are the expensive half of an LLM bill — typically three to five times the price of input tokens. Prefilling the assistant’s response lets you supply the boilerplate yourself instead of paying the model to generate it. This tool quantifies exactly how much you save per request, per day and per month.

How it works

When you prefill, you provide the first part of the assistant’s reply and the model continues from there. You are billed only for what the model generates, not for the prefix you wrote. So the saving per request is the number of output tokens the model no longer has to produce:

saved_output_tokens = tokens the model would have generated for that prefix
saving = (saved_output_tokens / 1,000,000) × output_price

Multiply by your daily request volume to get daily savings, and by thirty for a monthly figure. The savings compound with scale — a tiny per-request gain becomes meaningful across millions of calls.

Tips and notes

The biggest wins come from removing predictable boilerplate: opening JSON braces, fixed report headers, “Sure, here is…” preambles, or a known sentence stem. Prefilling does double duty — besides cutting cost it forces the output into the shape you want, which is the most reliable way to get clean JSON out of a model. Keep prefills short and unambiguous; an overly long prefill can box the model into a structure it cannot complete naturally. Always test that the model continues coherently from your prefix, and remember that some providers strip trailing whitespace from the prefill, which can affect formatting.

Ad placeholder (rectangle)