Transfer Learning (AI Glossary)

Reusing knowledge learned on one task to speed up learning on another

Ad placeholder (leaderboard)

Definition

Transfer learning is the technique of reusing knowledge a model has already acquired on one task to accelerate learning on a different task. Rather than training a fresh model from random weights, you start from a model that has already learned useful, general-purpose representations and adapt it to your specific problem. It is the central idea behind today’s foundation models: train once on a massive general corpus, then specialise many times.

The pre-train then fine-tune paradigm

Modern AI almost universally follows a two-stage recipe:

  1. Pre-training — a model learns broad patterns from an enormous, often unlabelled dataset. For language models this means predicting the next token across the internet; for vision models it means learning edges, textures, and shapes from millions of images.
  2. Fine-tuning — that pre-trained model is then trained further on a much smaller, task-specific dataset (legal documents, medical images, your support tickets) so it specialises in the target task.

The pre-trained model carries general competence; fine-tuning steers it toward a particular job.

Why it works — and why it saves resources

The costly part of building an AI system is learning the general representations — how language is structured, what visual features matter. Once a model has those, the remaining task is mostly adaptation, which is far cheaper. As a result, transfer learning dramatically reduces the labelled data and compute a downstream task needs: a few thousand examples and a short training run can rival what would otherwise require millions of examples trained from scratch.

Approaches: feature extraction vs fine-tuning

There is a spectrum of how much of the original model you change:

  • Feature extraction — freeze the pre-trained network and train only a small new “head” on top. Fast, cheap, and resistant to overfitting on small datasets.
  • Full or partial fine-tuning — unfreeze some or all original weights and update them too. More expensive and data-hungry, but capable of higher accuracy when the new task differs more from the original.
  • Parameter-efficient methods (e.g. LoRA) — update only a small number of added parameters, capturing much of fine-tuning’s benefit at a fraction of the cost.

Where you see it

Transfer learning is everywhere in practice. Every fine-tuned LLM, every image classifier built on a pre-trained backbone, and every embedding model adapted to a domain is an instance of it. It is also why zero-shot and few-shot prompting work at all: a sufficiently pre-trained model already carries enough transferable knowledge to handle new tasks from instructions or a handful of examples, sometimes with no weight updates required.

Ad placeholder (rectangle)