If you want a language model to use your data or behave a certain way, you have two main levers: Retrieval-Augmented Generation (RAG) and fine-tuning. They are often presented as rivals, but they solve different problems. Choosing wrongly wastes money and produces disappointing results, so it helps to understand exactly what each one changes.
What RAG actually does
RAG leaves the model untouched. Instead, at query time you search a knowledge base — usually a vector database of your documents — find the passages most relevant to the user’s question, and paste them into the prompt as context. The model then answers using that retrieved material. Because the knowledge lives outside the model, you can update it any time simply by re-indexing documents, and you can show users the exact sources behind an answer. This makes RAG ideal for question-answering over manuals, policies, support articles, or anything that changes frequently.
What fine-tuning actually does
Fine-tuning takes a base model and continues training it on your own input-output examples, adjusting its weights. The result is a model that has internalised a behaviour: a house writing style, a strict output format, a classification scheme, or a domain’s phrasing. Fine-tuning shines when you need consistent behaviour that is hard to specify in a prompt, or when you want to shorten prompts at high volume by moving instructions into the weights. It is a poor and costly way to teach new facts, because knowledge in weights is hard to update and can surface as confident, outdated answers.
Comparing the tradeoffs
- Data freshness. RAG wins decisively — update the index, not the model.
- Behaviour and style. Fine-tuning wins — it bakes the pattern in.
- Upfront cost. RAG is low; fine-tuning requires a training run and curated data.
- Per-query cost. RAG adds retrieval and larger prompts; fine-tuning can trim prompts.
- Latency. RAG adds a retrieval step; a fine-tuned model with short prompts can be faster.
- Transparency. RAG can cite sources; fine-tuned knowledge is opaque.
- Implementation effort. RAG needs a vector store and chunking pipeline; fine-tuning needs a clean labelled dataset and evaluation.
A decision guide
Ask what problem you are really solving. If the issue is “the model does not know my company’s current facts,” use RAG — it is faster, cheaper to update, and auditable. If the issue is “the model will not consistently follow my format or tone no matter how I prompt it,” fine-tune. If you need both current knowledge and consistent behaviour, do both: fine-tune for behaviour and layer RAG on top for facts. Start with prompting and RAG before reaching for fine-tuning, because they are cheaper to iterate on and solve the majority of real-world needs without a training run.