What Is Fine-Tuning an AI Model?

Teaching a pre-trained model new tricks without starting from scratch

Ad placeholder (leaderboard)

Fine-tuning, defined

Large language models start life with pre-training: they learn general language and broad knowledge from an enormous corpus of text. Fine-tuning is the step that comes after. You take that already-capable base model and continue training it on a smaller, focused dataset of examples that demonstrate exactly the behaviour you want. The model’s internal weights shift slightly so it gets better at your specific task — answering in a particular format, adopting a brand voice, classifying support tickets, or following a niche instruction style — than the general model ever would out of the box. Crucially, you are not building a model from scratch; you are specialising one that already knows the language.

Supervised fine-tuning and the data it needs

The most common form is supervised fine-tuning (SFT). You assemble a dataset of input-output pairs — a prompt and the ideal response — and train the model to reproduce those outputs. If you want a model that always returns clean JSON, your examples pair messy requests with perfectly formatted JSON answers; if you want a specific support tone, your examples pair customer messages with model replies in that tone. What matters most here is data quality and consistency. A few hundred to a few thousand carefully curated, on-target examples usually outperform tens of thousands of inconsistent ones, because the model faithfully learns whatever patterns — including mistakes — appear in the data.

The cost: compute and parameter-efficient methods

Fine-tuning every weight in a large model (full fine-tuning) is expensive: it needs significant GPU memory, time, and produces a full-sized copy of the model for each task. This is why parameter-efficient fine-tuning methods dominate in practice. The best known is LoRA (Low-Rank Adaptation), which freezes the original weights and trains only a small set of added “adapter” weights. QLoRA goes further by quantising the base model to save memory. These methods slash the hardware required, let you fine-tune on modest GPUs, and yield small adapter files you can swap in and out — so one base model can serve many fine-tuned behaviours.

When fine-tuning is the right tool

Fine-tuning shines when you need a consistent skill, style, or format that is hard to achieve with prompting alone, especially at scale where shorter prompts save cost and latency. It is excellent for teaching the model how to behave. It is a poor choice for injecting facts that change, because updating knowledge means re-training, and stale facts get baked in. For current or private information, retrieval-augmented generation (RAG) is usually better, and for quick experiments, prompt engineering is faster and cheaper. The honest rule of thumb: reach for prompting first, RAG when you need fresh facts, and fine-tuning only when you need reliable, repeated behaviour that prompting cannot deliver economically.

Ad placeholder (rectangle)