Question 1

What is the main difference between pre-training and RLHF?

Accepted Answer

Pre-training teaches a model knowledge and language by predicting the next token across trillions of words of internet text. RLHF teaches a model behaviour — to be helpful, safe, and instruction-following — by optimising against human preferences. Pre-training builds capability; RLHF aligns that capability with what humans actually want.

Question 2

Which stage uses more compute and data?

Accepted Answer

Pre-training is vastly larger on both counts. It processes trillions of tokens and can cost tens to hundreds of millions of dollars in compute. RLHF uses orders of magnitude less data — typically thousands to hundreds of thousands of human comparisons — and far less compute, because it refines an existing model rather than building one from scratch.

Question 3

Can you have one stage without the other?

Accepted Answer

You can pre-train without RLHF, and the result is a capable but unaligned base model that predicts text rather than following instructions. You cannot meaningfully run RLHF without pre-training first, because RLHF refines behaviour on top of knowledge the base model already has — it adds almost no new world knowledge of its own.

Question 4

Does RLHF make the model smarter?

Accepted Answer

Not in terms of raw knowledge. RLHF rarely adds new facts or capabilities; the model's intelligence largely comes from pre-training. What RLHF does is make the existing capability accessible and usable — surfacing the right answer, following the instruction, and behaving safely. It often makes a model feel much smarter even though the underlying knowledge was already there.

Pre-Training vs RLHF: The Two Stages That Make an AI Assistant

Two stages, two completely different goals

Objective: prediction vs preference

Data and compute: web-scale vs hand-curated

Why you need both

The practical takeaway