What is model routing?

Routing sends each request to the most appropriate model based on its difficulty — cheap models for easy tasks, premium models for hard ones — instead of using one model for everything. Done well it cuts cost with little quality loss.

How is the blended cost calculated?

Each tier's request share is priced at its assigned model's per-request cost, then summed and multiplied by your daily volume and 30 days. The baseline prices every request on the premium model.

How does the quality score work?

Each model carries a rough quality weight. The tool computes a traffic-weighted average so you can see how much quality you trade for the savings. Treat it as a directional indicator, not a benchmark.

Is any data sent to a server?

No. All routing math runs in your browser. Nothing you enter is uploaded, stored or logged.

What is the Model Routing Cost Optimizer?

Set the share of requests in each complexity tier, pick a model per tier, and see projected blended cost versus quality. Optimize routing across cheap, mid and premium models to cut LLM spend. It runs free in your browser on Gera Tools, with nothing uploaded.

Model Routing Cost Optimizer

Name: Model Routing Cost Optimizer
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Model routing cost optimizer

Using one premium model for every request is the simplest setup and the most expensive. Routing sends easy requests to a cheap model and hard requests to a premium one, often cutting spend significantly with minimal quality loss. This tool lets you split traffic by complexity, assign a model per tier, and instantly see the blended cost and a quality score against an all-premium baseline.

How it works

You define three complexity tiers (simple / medium / complex) as percentages of traffic and assign a model to each. The optimizer prices each tier at its model’s per-request cost and blends them:

blended_per_req = Σ (tier_share × model_per_req)
monthly         = blended_per_req × requests_per_day × 30
quality_score   = Σ (tier_share × model_quality)

The baseline prices every request on whatever you chose for the complex tier, so the saving shows what routing buys you versus “premium everywhere”.

The routing architecture

A real routing system has two parts: a classifier that assigns each request to a tier, and a dispatcher that sends the request to the right model. The classifier must be cheap enough that it does not eat the savings:

routing decision cost + cheap model cost < premium model cost

Common classifier approaches, cheapest first:

Keyword rules — check for domain-specific terms, length, or presence of code blocks
A tiny classification model — a purpose-trained small model that reads the first few sentences
A fast cheap model — ask a cheap model “is this a complex or simple request?” before routing

Because the classifier runs on every request, even a small per-request cost compounds quickly at scale.

Example: calculating the saving

For example, suppose your app receives 1,000 requests per day, distributed as:

60% simple (FAQ-style, keyword lookups, short rewrites)
30% medium (multi-step reasoning, light summarization)
10% complex (legal analysis, code generation, long documents)

If the cheap model costs $0.001 per request, a mid-tier model $0.005, and the premium model $0.02:

routed blended = 0.60×0.001 + 0.30×0.005 + 0.10×0.02
               = 0.0006 + 0.0015 + 0.002
               = $0.0041 per request

baseline (premium only) = $0.02 per request

saving = (0.02 - 0.0041) / 0.02 = 79% cost reduction
monthly saving at 1,000 req/day = (0.02 - 0.0041) × 30,000 = ~$477/month

The numbers change dramatically with your actual tier split — the tool shows this instantly as you adjust the sliders.

Tips for real-world implementation

The biggest win is moving the simple tier. This is usually the largest share of traffic, and cheap models handle FAQ, classification, and short-generation tasks nearly as well as premium ones. Start there.
Log tier assignments and outcomes. After a few days you will see which simple requests the cheap model mishandled. Use those to refine the classifier or move edge cases to medium.
Add a confidence fallback. If your classifier or the cheap model expresses uncertainty, escalate to premium. Escalations are rare and the cost stays manageable.
Re-tune from real logs. Teams consistently overestimate complex-tier traffic. Audit a sample of “complex” requests — you will usually find many are simple tasks that arrived with extra words in the prompt.
Account for latency. Some cheap models are faster, some slower. If latency is customer- facing, factor in the P95 response time per tier, not just cost.