Question 1

What is next-token prediction?

Accepted Answer

Next-token prediction is the training objective in which a model is shown a sequence of tokens and asked to predict the most likely next token. The prediction is compared to the actual next token, and the model is adjusted to make the correct token more probable. Repeated over trillions of tokens, this is how most LLMs learn.

Question 2

Why is it called self-supervised?

Accepted Answer

It is self-supervised because the labels come from the data itself rather than from human annotation. The "correct answer" for each position is simply the token that actually appears next in the text, so an unlimited amount of ordinary text becomes training data with no manual labelling required.

Question 3

How does predicting words teach a model facts and reasoning?

Accepted Answer

To predict the next token well across diverse text, a model must implicitly capture grammar, facts, relationships, and reasoning patterns, because those regularities are what make the next word predictable. Broad knowledge and reasoning ability therefore emerge as a side effect of getting good at this one prediction task.

Question 4

Is next-token prediction the same as how ChatGPT answers questions?

Accepted Answer

At inference the model still predicts one token at a time and appends it, then predicts again — this is called autoregressive generation. Question answering, summarising, and coding are all just the model continuing the text in the most likely way given your prompt, which is why prompt wording matters so much.

Next-Token Prediction (AI Glossary)

What next-token prediction is

Why it counts as self-supervised learning

Why a word-guessing game produces world knowledge

How it shows up at inference time

Limits baked into the objective