RAG vs fine-tuning cost breakeven
RAG and fine-tuning solve the same problem — giving a model knowledge — with opposite cost shapes. RAG is cheap to start but pays a per-request token tax forever, because every call stuffs retrieved chunks into the prompt. Fine-tuning pays a one-time fee and then carries almost no per-request overhead for that knowledge. This tool finds the request volume where fine-tuning’s upfront cost is repaid by RAG’s recurring overhead.
How it works
RAG’s recurring cost per request is extra tokens x inference cost per token. Fine-tuning
is a flat upfront cost with negligible marginal overhead for the baked-in knowledge. The
breakeven request count is fine-tuning cost / RAG cost per request — beyond that
many requests, fine-tuning is the cheaper option cumulatively. Dividing by your daily
volume converts that into a breakeven in days, so you can see whether the crossover
arrives in a week or in three years.
Tips and notes
- High volume plus large retrieved context pushes the breakeven close — fine-tuning often wins fast for chatbots answering the same domain thousands of times a day.
- If your knowledge changes weekly, RAG’s freshness usually outweighs a cost crossover — retraining cadence is a hidden cost fine-tuning carries.
- Many production systems do both: fine-tune tone and format, RAG the volatile facts. Use this to size only the cost trade-off, not the architecture decision wholesale.