Model routing cost optimizer
Using one premium model for every request is the simplest setup and the most expensive. Routing sends easy requests to a cheap model and hard requests to a premium one, often cutting spend 50-80% with minimal quality loss. This tool lets you split traffic by complexity, assign a model per tier, and instantly see the blended cost and a quality score against an all-premium baseline.
How it works
You define three complexity tiers (simple / medium / complex) as percentages of traffic and assign a model to each. The optimizer prices each tier at its model’s per-request cost and blends them:
blended_per_req = Σ (tier_share × model_per_req)
monthly = blended_per_req × requests_per_day × 30
quality_score = Σ (tier_share × model_quality)
The baseline prices every request on whatever you chose for the complex tier, so the saving shows what routing buys you versus “premium everywhere”.
Tips
- The biggest wins come from moving the simple tier off the premium model — that’s usually the largest share of traffic.
- Use a cheap classifier (even keyword rules or a tiny model) to decide the tier; the routing decision must cost far less than the saving.
- Add a confidence fallback: if the cheap model is unsure, escalate to the premium model. Cost stays low because escalations are rare.
- Re-tune the tier shares from real logs — teams routinely overestimate how many requests truly need the premium model.