Two stages, two completely different goals
Modern AI assistants are not trained in one shot. They emerge from at least two distinct phases that do very different jobs. Pre-training gives the model its raw intelligence: knowledge of the world, language, and reasoning patterns, learned by predicting the next token across a huge corpus. RLHF — Reinforcement Learning from Human Feedback — gives the model its manners: the tendency to be helpful, honest, harmless, and to actually follow the instruction it was given. Confusing the two leads to a common misunderstanding, that alignment makes models smarter. In reality, capability and behaviour are produced by separate stages, and you need both to get a usable assistant.
Objective: prediction vs preference
The two stages optimise fundamentally different things. Pre-training optimises a self-supervised objective — minimise the error of next-token prediction — where the “correct answer” is simply the next word already present in the text. RLHF optimises a preference-based objective — maximise a reward derived from humans ranking which responses they prefer. One learns from the structure of language itself; the other learns from explicit human judgement about quality and safety. This difference in objective is the root of every other difference between them.
Data and compute: web-scale vs hand-curated
The scale gap between the two stages is enormous. Pre-training consumes trillions of tokens of raw text and runs for weeks or months on thousands of accelerators, with costs reaching tens or hundreds of millions of dollars. RLHF, by contrast, runs on a comparatively tiny dataset — often thousands to a few hundred thousand human comparisons — and a small fraction of the compute, because it is fine-tuning an already-trained model rather than building one. Pre-training data is cheap and abundant (any text will do); RLHF data is expensive and scarce (it requires careful human labelling). This is why the two stages have such different economics and why most organisations can afford to do RLHF-style tuning but not pre-train from scratch.
Why you need both
Each stage is useless alone for building an assistant. A pre-trained-only model is brilliant but unruly: ask it a question and it may answer, may pose more questions, or may drift into unrelated text, because it was only ever taught to continue plausible text. Trying to run RLHF without pre-training would be pointless — there would be no knowledge or capability to align, since RLHF adds almost no new facts. The two are complementary: pre-training supplies the capability, RLHF makes it usable. The assistant you interact with is the product of capability shaped by alignment.
The practical takeaway
If a model gives a wrong fact, that usually traces back to pre-training — the knowledge was missing or muddled in the base model, and no amount of preference tuning reliably invents it. If a model is rude, refuses reasonable requests, ignores instructions, or is unsafe, that usually traces back to RLHF and alignment — its behaviour was not shaped well. Knowing which stage owns which kind of behaviour helps you set realistic expectations: alignment can make a model far more pleasant and trustworthy to use, but it cannot turn a small or poorly-pre-trained base into a genuinely knowledgeable one.