Question 1

How does ChatGPT actually work?

Accepted Answer

ChatGPT is a large transformer language model that predicts the next token given the conversation so far. It is built in stages: pre-training on a huge corpus of text, supervised fine-tuning on example conversations, and reinforcement learning from human feedback (RLHF) to align it with human preferences.

Question 2

What happens during pre-training?

Accepted Answer

During pre-training the model reads hundreds of billions of tokens of internet text, books, and code, learning to predict the next token at each position. This builds a broad statistical understanding of language, facts, and reasoning patterns, producing a 'base model' that is knowledgeable but not yet a helpful assistant.

Question 3

What is the reward model in ChatGPT's training?

Accepted Answer

After supervised fine-tuning, humans rank multiple model responses to the same prompt. Those rankings train a separate reward model that predicts how much a human would prefer any given response. This reward model then guides reinforcement learning, scoring outputs so the assistant can be optimised at scale.

Question 4

Does ChatGPT understand what it says?

Accepted Answer

ChatGPT does not understand meaning the way humans do. It is a statistical next-token predictor that has learned extremely rich patterns from text. It can be fluent and frequently correct, but it has no grounded model of the world, which is why it can hallucinate confident but false statements.

How Does ChatGPT Work? A Deep Technical Explanation

Definition

Stage 1: Pre-training

Stage 2: Supervised fine-tuning

Stage 3: Reward modelling and RLHF

What happens when you hit send