Prompt-to-completion cost ratio analyzer
Because providers price input and output tokens at different rates, the token split of your workload is not the same as its cost split. A request that is 80% input by token count can still be mostly output by cost. This analyzer breaks down both so you optimize the side that actually drives your bill.
How it works
The analyzer takes your average prompt and completion token counts and applies the model’s separate prices:
input_cost = prompt_tokens / 1,000,000 × input_price
output_cost = completion_tokens / 1,000,000 × output_price
input_share = input_cost / (input_cost + output_cost)
It then reports the token ratio (prompt vs completion) and the cost ratio side by side. The divergence between the two is the insight — it reveals when a small amount of output is quietly dominating your spend.
Tips and notes
- Optimize the dominant side. If output is most of the cost, cap
max_tokensand ask for concise answers. If input dominates, trim prompts and cache stable context. - Compare models on your ratio. A model with cheap input but pricey output is great for input-heavy work and bad for generation — match the model to your split.
- Watch the ratio drift. As features evolve, your mix changes. Re-check the split periodically so your cost strategy stays aimed at the right target.