Question 1

What is pre-training?

Accepted Answer

Pre-training is the first phase of building a large language model, where it learns from enormous amounts of unlabelled text by predicting the next token over and over. This gives the model broad language ability and world knowledge before any task-specific tuning.

Question 2

Why is pre-training called self-supervised?

Accepted Answer

No human labels are needed — the data labels itself. For each chunk of text, the next token is the 'correct answer', so the model generates its own training targets directly from raw text. This is what lets pre-training scale to trillions of tokens.

Question 3

How much does pre-training cost?

Accepted Answer

Frontier models require enormous compute, often on the order of 10^24 to 10^25 floating-point operations (FLOPs) and tens of millions of dollars in GPU time. This cost is the main reason only a handful of organisations pre-train the largest models from scratch.

Question 4

What is the difference between pre-training and fine-tuning?

Accepted Answer

Pre-training builds general capability from scratch on huge generic corpora and is hugely expensive. Fine-tuning then adapts that pre-trained model to a specific task or style on a much smaller dataset, at a tiny fraction of the cost.

Pre-Training (AI Glossary)

Definition

Self-supervised learning

Training compute and FLOPs

What pre-training produces

Pre-training vs. fine-tuning