Rate limit burst capacity planner
Traffic rarely arrives evenly. A scheduled job, a viral moment, or a batch
import can dump thousands of requests into a few seconds — and if that exceeds
your provider’s tokens-per-minute (TPM) limit, you get a wall of 429
errors. This planner models the burst, tells you whether it fits, and sizes the
queue depth and drain time you need to absorb it gracefully.
How it works
The planner converts your burst into a token demand and compares it to your TPM capacity expressed per second:
burst_tokens = requests × avg_tokens_per_request
capacity/sec = TPM / 60
demand/sec = burst_tokens / burst_duration
If demand per second exceeds capacity per second, a backlog forms. The peak queue depth is the overflow, and it drains at the rate of spare capacity once the burst ends. The tool also applies your retry cost multiplier to estimate the extra spend a naive retry-on-429 strategy would incur versus smooth queuing.
Tips and notes
- Size your queue for the peak burst, not the average — averages hide the spikes that actually trip the limit.
- Prefer a token-bucket queue with jittered exponential backoff over blind retries; it keeps you just under the limit and avoids retry storms.
- If drain time is unacceptably long, the real fix is a higher TPM tier or splitting traffic across multiple keys or providers, not more retries.