Context Utilization Efficiency Score

See how efficiently you're using paid context window tokens

Ad placeholder (leaderboard)

Score how efficiently your prompts spend tokens

You pay for every token you send, useful or not. This tool gives two views of efficiency: how full your context window is, and — more importantly — what fraction of each prompt is actually useful. It then prices out the tokens that are pure overhead so you know what trimming them is worth.

How the score works

Window utilization is simply how much of the model’s capacity each request uses:

window_utilization = avg_prompt_tokens / context_window

Content efficiency is the useful share of the prompt. The wasted cost combines the two ideas with your volume and price:

wasted_tokens = avg_prompt_tokens x (1 - useful_fraction)
wasted_monthly = wasted_tokens / 1M x input_price x requests_per_month

A high score means most of what you send earns its keep. A low score means a big slice of every bill is boilerplate, repetition, or stale history.

Tips to raise your score

  • Deduplicate instructions. Repeating the same system guidance in multiple messages is paid for every time — state it once.
  • Prune stale history. In long chats, drop turns the model no longer needs rather than re-sending the whole transcript.
  • Right-size the window. If utilization is 3% you may be paying a premium for a context window you will never fill — a smaller model can be cheaper.
Ad placeholder (rectangle)