What is exponential backoff?

Exponential backoff increases the wait between retries multiplicatively — for example 1s, 2s, 4s, 8s. This gives an overloaded or rate-limited service time to recover instead of hammering it with immediate retries, which is why it is the standard pattern for resilient API clients.

If many clients retry on the same fixed schedule they all hit the server at the same moments, creating synchronized spikes — the thundering-herd problem. Jitter randomizes each delay so retries spread out. Full jitter picks a random value between zero and the computed delay; equal jitter keeps half the delay fixed and randomizes the other half.

How do I pick a timeout budget?

Your overall request timeout should be at least the worst-case cumulative delay plus the time of each actual API call. The calculator gives you the cumulative backoff total so you can add your per-call latency on top and set a sensible deadline.

Should I retry every error?

No. Retry transient errors — HTTP 429 rate limits, 500/502/503/504, and network timeouts — but not 400-class client errors like a malformed request or an invalid API key, which will fail identically every time. Retrying those just wastes the budget and delays surfacing the real problem.

What is the LLM API Retry Strategy Calculator?

Configure base delay, max retries, multiplier, and jitter; the tool calculates the full retry schedule, expected total wait time, and worst-case timeout budget for resilient LLM API integrations. It runs free in your browser on Gera Tools, with nothing uploaded.

LLM API Retry Strategy Calculator

Name: LLM API Retry Strategy Calculator
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Get one useful tool a week

Like this tool? Enter your email and we'll send you one genuinely useful Gera tool a week — plus a link to come back to this one. No spam, one-click unsubscribe any time.

Size your retry behavior before it bites you in production

LLM APIs throttle and occasionally fail, so any serious integration needs retry logic. Get the parameters wrong and you either give up too soon or stack delays that blow past your request timeout. This calculator turns your base delay, retry count, multiplier, and jitter choice into a concrete schedule — every attempt’s delay, the running total, and the worst-case budget you need to allow.

How it works

For attempt number n (starting at zero), the base delay before that attempt is:

delay(n) = min(max_delay, base_delay × multiplier^n)

The tool applies your chosen jitter mode to that computed delay, then sums the upper bound of every delay across all attempts to give the worst-case cumulative wait. Add your per-call API latency on top to arrive at the total timeout budget you need to allow.

The three jitter modes

No jitter — delays are exact: 1s, 2s, 4s, 8s. Simple, but if many clients start at the same time, they all hit the same retry schedule and generate synchronized spikes — the thundering herd problem.

Full jitter — each delay is a random value between 0 and the computed cap. This spreads retries across the full window so bursts cannot synchronize. It is the approach recommended by AWS for its own APIs and is the best default for most cases.

Equal jitter — half the delay is fixed (the computed cap divided by 2) and the other half is randomized. This guarantees a minimum wait per attempt (so clients cannot accidentally retry immediately) while still spreading load.

A multiplier of 2 with full jitter is the well-tested default for most API clients.

Which errors to retry — and which not to

Never retry indiscriminately. Retrying a 400 Bad Request burns your budget and delays surfacing the real problem. The rule is:

HTTP status	Retry?	Reason
429 Too Many Requests	Yes — honor Retry-After	Transient rate limiting
500 Internal Server Error	Yes	Transient server fault
502 Bad Gateway	Yes	Transient upstream fault
503 Service Unavailable	Yes	Server temporarily overloaded
504 Gateway Timeout	Yes	Upstream timeout
Network timeout / connection reset	Yes	Transient connectivity
400 Bad Request	No	Fix the request
401 Unauthorized	No	Fix the credentials
403 Forbidden	No	Fix permissions
404 Not Found	No	Fix the endpoint
422 Unprocessable Entity	No	Fix the request body

When the API returns a Retry-After header on a 429, use that value as your delay instead of the computed exponential delay. The calculator covers cases where no such header is returned.

Setting a realistic timeout budget

Your overall request timeout must be at least:

timeout = (per-attempt latency × max_retries+1) + sum of all delays (worst case)

If a single LLM call can take up to 30 seconds and you allow 3 retries with a worst-case delay sum of 15 seconds, your client timeout should be at least 135 seconds — not 30. The schedule the calculator generates makes this arithmetic explicit before you discover it in a production outage.