Cost Sensitivity Analyzer

See whether shorter prompts or shorter completions cut your LLM bill more.

Ad placeholder (leaderboard)

Cost sensitivity analyzer

Should you spend effort trimming the prompt or capping the completion? It depends on which one your bill is sensitive to. This analyzer takes your baseline tokens and model, then sweeps prompt and completion length independently across a range so you can see exactly which lever moves total cost more — and stop optimizing the one that barely matters.

How it works

The cost of one request is:

cost = (input_tokens / 1,000,000) × input_price
     + (output_tokens / 1,000,000) × output_price

The tool holds one value at its baseline and varies the other from −range to +range (e.g. −50% to +50%), then repeats with the roles swapped. Because output is priced higher, the completion curve is usually steeper — but the result depends on your specific ratio. The analyzer reports the cost at the low and high end of each sweep and the swing each lever produces, so the dominant factor is obvious.

Tips for acting on the result

  • If output dominates: set a tighter max_tokens, ask for concise answers, and avoid prompting for verbose explanations you don’t read.
  • If input dominates: compress the prompt, dedupe context, and use prompt caching so stable context bills at a fraction of the input rate.
  • Re-check after model changes. Switching to a model with different input/output pricing can flip which lever matters.
  • Optimize the dominant lever first. A 20% cut on the lever that drives 80% of cost beats a 50% cut on the one that drives 20%.
Ad placeholder (rectangle)