Question 1

What is Constitutional AI?

Accepted Answer

Constitutional AI (CAI) is Anthropic's training method where the model improves its own behaviour against a written set of principles — a constitution — instead of relying solely on humans to label every harmful response. The model critiques and revises its answers according to those principles, then a reward model is trained on AI-generated preferences. It is designed to make harmlessness more scalable and transparent.

Question 2

How does Claude's training differ from standard RLHF?

Accepted Answer

Standard RLHF trains a reward model on human comparisons of responses. Claude adds a second source of feedback generated by AI itself, guided by the constitution — an approach often called RLAIF (reinforcement learning from AI feedback). Humans still shape the constitution and oversee the process, but far less human labelling is needed for the harmlessness stage, making it more scalable.

Question 3

Is Claude a transformer like GPT?

Accepted Answer

Yes. At the architecture level Claude is a large decoder-style transformer trained to predict the next token, just like other frontier models. What is distinctive is not the network shape but the alignment training — Constitutional AI and the way feedback is gathered — which shapes how the model behaves rather than how it is structurally built.

Question 4

What is the AI constitution made of?

Accepted Answer

The constitution is a list of human-written principles drawn from sources like human-rights declarations, platform guidelines, and Anthropic's own values. They instruct the model to be helpful, honest, and harmless, to avoid manipulation, and to respect autonomy. The model uses these principles as the yardstick when critiquing and revising its own outputs during training.

How Anthropic's Claude Is Built: Constitutional AI and RLHF

What is actually distinctive

The standard recipe: pre-training plus RLHF

Constitutional AI: the key idea

RLAIF: feedback from AI, guided by humans

Why the approach matters