What Is a Foundation Model?

Pre-trained on everything, fine-tuned for anything: the paradigm shift in AI

Ad placeholder (leaderboard)

What a foundation model is

A foundation model is a single, large AI model trained on broad, general data and then adapted to many different downstream tasks. The name, popularised by Stanford researchers in 2021, captures the idea that one model acts as a foundation you build many applications on top of — instead of training a fresh, narrow model for every problem. Familiar examples include the GPT and Claude families for text, CLIP for images and text, and various code and speech models. This “train once, adapt many times” pattern is the defining shift in modern AI.

Pre-training: learn from everything

The first phase is pre-training. The model ingests enormous amounts of mostly unlabelled data — much of the web, books, code repositories, images — using self-supervised objectives such as predicting the next token. This stage is extremely expensive, requiring large clusters of accelerators running for weeks, but it only has to happen once. What comes out is a general-purpose model with broad knowledge and skills that nobody programmed in explicitly.

Adaptation: specialise for anything

The second phase is adaptation, where the general model is pointed at a specific task. There are several ways to do this, ranging from cheap to involved:

  • Prompting — describe the task in plain language, with optional examples, and the model responds with no weight changes at all.
  • Fine-tuning — continue training on a smaller labelled dataset so the model specialises, sometimes updating only lightweight adapters to save cost.
  • Instruction tuning and RLHF — align the model to follow instructions and human preferences.

Because adaptation is far cheaper than pre-training, a single foundation model can power thousands of products.

Emergent capabilities at scale

A striking property of foundation models is emergence: certain abilities appear only after the model crosses a threshold of size and data, and are simply absent in smaller versions. Skills like in-context learning (picking up a task from examples in the prompt), multi-step reasoning, and translation between languages it was never explicitly taught can emerge from scale alone. This is a major reason researchers kept making models bigger — new capabilities kept appearing.

Why foundation models matter

Foundation models reshaped how AI is built and deployed:

  • Reuse — one base model serves many tasks, slashing the cost of new applications.
  • Capability — broad pre-training gives strong performance even on tasks with little task-specific data.
  • Risk concentration — flaws, biases or vulnerabilities in the base model propagate to everything built on it, which is why evaluation and safety work focus heavily on the foundation layer.

In short, a foundation model is the general-purpose engine of modern AI: trained once on broad data, adapted endlessly, and capable of more than the sum of its training thanks to emergence at scale.

Ad placeholder (rectangle)