Rate Limit Burst Capacity Planner

Plan request queuing to absorb traffic bursts within your TPM limit

Ad placeholder (leaderboard)

Rate limit burst capacity planner

Traffic rarely arrives evenly. A scheduled job, a viral moment, or a batch import can dump thousands of requests into a few seconds — and if that exceeds your provider’s tokens-per-minute (TPM) limit, you get a wall of 429 errors. This planner models the burst, tells you whether it fits, and sizes the queue depth and drain time you need to absorb it gracefully.

How it works

The planner converts your burst into a token demand and compares it to your TPM capacity expressed per second:

burst_tokens   = requests × avg_tokens_per_request
capacity/sec   = TPM / 60
demand/sec     = burst_tokens / burst_duration

If demand per second exceeds capacity per second, a backlog forms. The peak queue depth is the overflow, and it drains at the rate of spare capacity once the burst ends. The tool also applies your retry cost multiplier to estimate the extra spend a naive retry-on-429 strategy would incur versus smooth queuing.

Tips and notes

  • Size your queue for the peak burst, not the average — averages hide the spikes that actually trip the limit.
  • Prefer a token-bucket queue with jittered exponential backoff over blind retries; it keeps you just under the limit and avoids retry storms.
  • If drain time is unacceptably long, the real fix is a higher TPM tier or splitting traffic across multiple keys or providers, not more retries.
Ad placeholder (rectangle)