Plan token quotas that protect your margin
If your product passes LLM calls through to users, an unbounded user is an unbounded bill. Token-based quotas turn that risk into a fixed, plannable number: each tier gets a monthly token allowance, and your maximum exposure is simply the sum of those allowances times your token price. This planner computes the worst-case cost per tier and your total cost ceiling so you can price every plan above what it can ever cost.
How it works
For each tier, the maximum monthly cost per user is:
max_cost_per_user = tier_tokens / 1,000,000 × blended_price
tier_ceiling = max_cost_per_user × users_in_tier
Summed across tiers, the total cost ceiling is the most you could spend on inference in a worst-case month. The planner also reports a blended cost per user across your whole base, which is the number to keep below your blended revenue per user. As long as each tier’s price exceeds its per-user ceiling, the unit economics hold no matter how heavily users engage.
Tips for token SLAs
- Price above the ceiling, not the average. Averages feel safe until a power user or an abuse spike arrives; the ceiling is what an SLA must cover.
- Make quotas visible. Show users their remaining tokens so the limit is a feature, not a surprise wall.
- Offer overage, don’t just block. Metered overage above the quota recovers cost from heavy users instead of cutting them off.
- Re-blend the price when models change. Switching default models or enabling caching shifts your effective per-token cost — re-run the plan when it does.