How is the redundancy overhead calculated?

It is the share of traffic you send to the more expensive secondary provider multiplied by its price premium over the primary, applied to your monthly spend. Sending 20% of traffic to a provider that costs 15% more adds 3% to your bill.

How is the avoided downtime cost estimated?

Single-provider expected downtime per month equals downtime probability times minutes in a month, times your cost per minute. Redundancy is assumed to cut that exposure sharply because a failover provider absorbs the outage.

What downtime probability should I use?

A provider advertising 99.9% uptime implies ~0.1% downtime. Translate the SLA into a probability — 99.9% is 0.001, 99.99% is 0.0001 — or use your own observed incident history if you have it.

Is my data sent anywhere?

No. The planner runs entirely in your browser. Nothing you enter is uploaded, stored, or logged.

Does redundancy fully eliminate downtime cost?

No. Failover takes time, both providers can share an upstream dependency, and routing logic can fail. The tool models a strong but not perfect reduction, which is why some residual downtime cost remains.

What is the Provider Redundancy Cost Planner?

Model the extra cost of routing LLM traffic across two providers for failover against the expected cost of single-provider downtime — see whether redundancy pays for itself given your outage probability and per-minute downtime cost. It runs free in your browser on Gera Tools, with nothing uploaded.

Provider Redundancy Cost Planner

Name: Provider Redundancy Cost Planner
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Is multi-provider redundancy worth the cost?

Running on a single LLM provider is cheapest until that provider has an outage — then every minute of downtime costs you revenue, SLA credits, or churn. Routing a slice of traffic to a second provider buys resilience but adds cost. This planner puts the redundancy overhead next to the expected downtime cost it avoids so you can make the call with numbers.

How it works

The overhead is the premium you pay to send part of your traffic to a pricier secondary provider:

overhead = monthly_spend × secondary_traffic_share × price_premium

The avoided loss is your single-provider expected downtime cost, which redundancy largely removes:

single_provider_downtime_cost = downtime_probability
                              × minutes_per_month
                              × cost_per_minute

If avoided loss exceeds overhead, redundancy pays for itself.

The architecture decisions that affect cost

Multi-provider redundancy is not a single architecture — there are several configurations, each with different cost implications:

Active-passive with automatic failover: All traffic goes to the primary provider. A secondary provider is provisioned and tested but receives no production traffic under normal conditions. When the primary fails, traffic automatically routes to the secondary. Cost overhead is near zero under normal conditions; failover brings temporary secondary charges. This is the most cost-efficient architecture when outages are rare.

Active-active load balancing: Traffic is continuously split between providers, typically 80/20 or similar. Both providers serve production traffic at all times. Cost overhead is permanent but predictable. The benefit is that failover is instantaneous (no detection and routing lag), and you get continuous quality comparison data between providers. Cost overhead is secondary_share × price_premium.

Shadow routing with leader-based serving: All traffic is served by the primary. A fraction of requests is also asynchronously sent to the secondary (shadow mode) for quality comparison, but only the primary’s response is served. This provides readiness testing without exposing users to secondary responses. Cost overhead is the shadow traffic percentage times the secondary’s price.

The planner models the active-passive and active-active cases directly. Shadow routing has costs between the two.

What downtime actually costs in practice

The per-minute downtime cost is the most important number in this calculation, and it is consistently underestimated. A complete accounting should include:

Direct revenue impact — for products where LLM features are the primary value proposition (copilots, AI assistants, AI-powered search), downtime during peak hours can mean complete loss of revenue for that period, plus users switching to alternatives permanently.

SLA penalties — if your product commits to uptime SLAs with enterprise customers, downtime directly creates financial liability. Compute the penalty clause per minute from your contracts.

Support load — outages generate support tickets. Even at conservative assumptions, a major outage can generate hundreds of tickets, each costing £5–20 in agent time.

Long-tail churn — users who experience downtime, especially during critical moments, have meaningfully higher churn rates in the days and weeks following. This is the hardest cost to quantify but often the largest.

Reputation and pipeline — for B2B products, a high-visibility outage can stall or lose enterprise deals that are in progress.

For a product generating meaningful revenue, the realistic per-minute cost is often 10–100× the raw API spend per minute.

When redundancy clearly pays and when it probably does not

Redundancy clearly pays when:

Your product’s core value depends entirely on LLM features being available
You serve customers under contractual uptime commitments
Your primary provider’s SLA implies minutes of downtime per month
The price difference between providers is under 20%

Redundancy probably does not pay when:

LLM features are supplementary, not core — users can continue using the product without them
Your product is in early development with low traffic
Both providers you are considering share significant infrastructure dependencies (the same cloud region, the same upstream provider)
The secondary provider’s quality is significantly lower, meaning failover would also generate support tickets

Tips for a sound decision

Use a realistic per-minute cost. Include lost revenue, SLA penalties, and support load — not just compute. This number dominates the result.
Do not assume perfect failover. Shared upstreams and slow detection mean redundancy reduces, not eliminates, downtime cost.
Right-size the secondary share. You rarely need a 50/50 split; a small warm standby plus fast failover often captures most of the benefit at a fraction of the overhead.
Test the failover path. Redundancy that has never been exercised is not reliable redundancy. Include regular failover drills in your operational plan.