LLM Rate Limit Calculator

Calculate safe request rates given your LLM API tier limits.

Ad placeholder (leaderboard)

Hitting 429 errors means you are sending requests faster than your tier allows. This calculator turns your RPM and TPM limits plus your typical request size into concrete, safe settings: a sustainable request rate, a concurrency level, and a minimum delay between calls.

How it works

Providers enforce two ceilings simultaneously — requests per minute and tokens per minute — and your real throughput is bounded by whichever you hit first:

  • RPM-bound rate = your RPM limit.
  • TPM-bound rate = TPM limit ÷ tokens per request.

The calculator takes the smaller of the two, applies a safety margin (default 90%) for token-count variance and clock drift, and reports the binding constraint plus a safe concurrency estimate and inter-request delay. All math runs locally in your browser.

Tips

If you are TPM-bound, shrinking prompts (trimming history, using retrieval) buys more throughput than a higher RPM ever will. If you are RPM-bound, batching multiple items into one request helps. Keep exponential backoff and Retry-After handling in your client regardless — token estimates drift and shared quotas mean staying under the average rate reduces 429s but never eliminates them entirely.

Ad placeholder (rectangle)