Question 1

What do TPM and RPM mean?

Accepted Answer

TPM is tokens per minute — the total number of input plus output tokens you can process across all requests in a rolling minute. RPM is requests per minute — how many separate API calls you can make in that window. AI providers typically enforce both at once, so you can be throttled by hitting either ceiling even if the other has headroom.

Question 2

Why do I get a 429 error?

Accepted Answer

A 429 'Too Many Requests' status means you exceeded a rate limit — usually TPM or RPM, sometimes a daily token or spend cap. It is a temporary signal to slow down, not a permanent failure. The right response is to wait and retry with exponential backoff rather than immediately hammering the endpoint again.

Question 3

How do I increase my rate limits?

Accepted Answer

Most providers raise limits automatically as your account ages and your usage and payment history grow, moving you up through usage tiers. You can often accelerate this by pre-paying credits, completing identity or organisation verification, or requesting a limit increase through the provider's dashboard for production workloads.

Question 4

What is the best way to handle rate limits in code?

Accepted Answer

Combine three techniques: retry 429 responses with exponential backoff and jitter, queue requests client-side so you never exceed your per-minute budget, and read the rate-limit headers the API returns to track remaining capacity. Together these keep throughput high without tripping limits or wasting requests on doomed retries.

AI API Rate Limits Explained: OpenAI, Anthropic, and Google

What a rate limit actually is

TPM vs RPM: the two limits that matter most

How limits differ across providers and tiers

Handling 429s gracefully

Strategies to stay within your limits