What counts as the upfront cost?

Two things — the API cost of generating teacher outputs for your training set, plus the fine-tuning job cost. Together these are the investment you must earn back through cheaper inference.

How is the payback period calculated?

Payback days equal the total upfront cost divided by daily inference savings. Daily savings equal your per-request cost reduction times your daily request volume.

Is my data sent anywhere?

No. The calculator runs entirely in your browser. Nothing you enter is uploaded, stored or logged.

What is the Knowledge Distillation Cost-ROI Calculator?

Free knowledge distillation ROI calculator. Model the cost of generating teacher outputs with GPT-4o, fine-tuning a smaller student model, and the per-request inference savings to find your break-even volume and payback period. It runs free in your browser on Gera Tools, with nothing uploaded.

Knowledge Distillation Cost-ROI Calculator

Name: Knowledge Distillation Cost-ROI Calculator
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Knowledge distillation ROI calculator

Distilling a large model like GPT-4o into a smaller fine-tuned student can cut your per-request cost dramatically — but only if your volume is high enough to earn back the upfront investment. This calculator weighs the one-time cost of generating teacher data and fine-tuning against the ongoing per-request savings to find your break-even point.

What is knowledge distillation?

Knowledge distillation is the process of training a smaller “student” model to replicate the behaviour of a larger, more expensive “teacher” model. Rather than learning from raw labels, the student learns from the teacher’s outputs — the probability distributions or structured responses the teacher produces on a set of carefully curated inputs. The result is a compact model that approximates the teacher’s quality on the target task at a fraction of the inference cost.

The technique is especially relevant for LLM-based applications because large frontier models charge per token and can be expensive at scale. A well-distilled student model hosted on a smaller endpoint or a cheaper API tier can handle the same workload at significantly lower cost once the training investment is amortised.

How it works

There are two costs to recover. First, you spend GPT-4o tokens generating high-quality labeled outputs for your training set. Second, you pay for the fine-tuning job itself. Together these form the upfront investment. Each request served by the cheaper student model then returns a fixed saving.

upfront      = teacher_generation_cost + fine_tuning_cost
daily_saving = inference_cost_delta × daily_requests
payback_days = upfront / daily_saving
year_net     = daily_saving × 365 − upfront

If your volume is low, the payback may stretch beyond a year — in which case staying on the large model is the rational choice.

Illustrative example

Suppose generating teacher outputs for a training set costs $200 in API fees and the fine-tuning job costs $150 — a total upfront investment of $350. After distillation the per-request cost drops by $0.005. At 500 daily requests, the daily saving is $2.50, giving a payback period of 350 ÷ 2.50 = 140 days. Over a full year the net saving would be (2.50 × 365) − 350 = $562.50. These are illustrative numbers; enter your real figures to see your own break-even.

Factors that shift the break-even

Training set size — more examples improve student quality but cost more teacher tokens to generate; there is a diminishing-returns point.
Task difficulty — distillation works best for narrow, consistent tasks (classification, extraction, formatting). Open-ended generation distills less cleanly and may need a larger student model.
Model pricing — as large-model pricing drops, the per-request saving shrinks. Re-run the calculator when pricing changes.
Student model quality gate — if the student degrades on edge cases you must either invest in more training data or fall back to the teacher for those inputs, reducing the effective saving.

Tips and notes

Volume is everything. Distillation pays back fast at thousands of daily requests and may never pay back at dozens.
Budget for evaluation. A student model needs a quality gate before it replaces the teacher; factor that effort in even though it is not a token cost.
Re-distill as the teacher improves. When the teacher model gets cheaper or better, re-run the math — the break-even shifts.