Why cap users by tokens instead of requests?

A single request can range from a few hundred to tens of thousands of tokens, so request limits barely bound cost. Token quotas put a hard, predictable ceiling on what any user can cost you in a month, which is what an SLA needs.

What is the cost ceiling and why does it matter?

The cost ceiling is your total token spend if every user consumed their entire quota. Pricing each tier above its per-user ceiling guarantees you never lose money on inference, even in a worst-case usage month.

Should I expect users to hit their full quota?

Usually not — most users consume a fraction of their allowance, so real spend is well below the ceiling. But planning to the ceiling means a usage spike or abuse never turns into an unbounded bill.

Is my data sent anywhere?

No. The planner runs entirely in your browser. Nothing you enter is uploaded, stored or logged.

What is the Token-Based SLA Planner?

Design tiered monthly token quotas for free, pro and enterprise users. See the maximum cost per tier, blended cost per user, and your total cost ceiling at any user mix so you can price plans that never lose money on tokens. It runs free in your browser on Gera Tools, with nothing uploaded.

Token-Based SLA Planner

Name: Token-Based SLA Planner
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Get one useful tool a week

Like this tool? Enter your email and we'll send you one genuinely useful Gera tool a week — plus a link to come back to this one. No spam, one-click unsubscribe any time.

Plan token quotas that protect your margin

If your product passes LLM calls through to users, an unbounded user is an unbounded bill. Token-based quotas turn that risk into a fixed, plannable number: each tier gets a monthly token allowance, and your maximum exposure is simply the sum of those allowances times your token price. This planner computes the worst-case cost per tier and your total cost ceiling so you can price every plan above what it can ever cost.

How it works

For each tier, the maximum monthly cost per user is:

max_cost_per_user = tier_tokens / 1,000,000 × blended_price
tier_ceiling      = max_cost_per_user × users_in_tier

Summed across tiers, the total cost ceiling is the most you could spend on inference in a worst-case month. The planner also reports a blended cost per user across your whole base, which is the number to keep below your blended revenue per user. As long as each tier’s price exceeds its per-user ceiling, the unit economics hold no matter how heavily users engage.

Worked example

Imagine a SaaS product with three tiers and a blended token cost of $2.00 per million tokens:

Tier	Monthly tokens	Users	Max cost/user	Tier ceiling
Free	50,000	1,000	$0.10	$100
Pro at $19/mo	2,000,000	200	$4.00	$800
Enterprise at $99/mo	10,000,000	20	$20.00	$400
Total		1,220		$1,300

For example, the Pro tier charges $19/month and the maximum inference cost per Pro user is $4.00 — that is a $15 gross margin per Pro user at full quota usage. In practice, most Pro users will consume a fraction of their allowance, so actual margins are higher. But the plan guarantees positive unit economics even in the worst case.

The Free tier costs at most $100/month across 1,000 users — a manageable acquisition cost that is worth covering if free users convert to Pro at a reasonable rate.

Setting the blended token price

The blended price is your effective per-million-token cost across all models you serve, weighted by the mix of input and output tokens. For example:

If your calls are 80% input and 20% output, and input costs $3/M while output costs $15/M:
Blended = (0.8 × $3) + (0.2 × $15) = $2.40 + $3.00 = $5.40 per million tokens

Prompt caching can reduce the effective input cost significantly — if you cache system prompts heavily, the blended price can drop by 50–80% of the input component.

Tips for token SLAs

Price above the ceiling, not the average. Averages feel safe until a power user or an abuse spike arrives; the ceiling is what an SLA must cover.
Make quotas visible. Show users their remaining tokens so the limit is a feature, not a surprise wall.
Offer overage, don’t just block. Metered overage above the quota recovers cost from heavy users instead of cutting them off.
Re-blend the price when models change. Switching default models or enabling caching shifts your effective per-token cost — re-run the plan when it does.
Review free tier cost as a share of CAC. The free tier inference cost is a customer acquisition cost. As long as it is well below the LTV of a converted user, it is a legitimate investment.