Question 1

What is Constitutional AI in simple terms?

Accepted Answer

It is a training method where an AI is given a written set of principles (a constitution) and learns to critique and revise its own answers against those principles, so it becomes helpful and harmless with far less direct human labelling of harmful content.

Question 2

How is Constitutional AI different from RLHF?

Accepted Answer

Standard RLHF relies on humans ranking responses, including harmful ones. Constitutional AI replaces much of that human feedback with AI feedback guided by the constitution (RLAIF), which scales better and reduces how much harmful text humans must review.

Question 3

What is in the constitution?

Accepted Answer

A list of plain-language principles drawn from sources like the UN Declaration of Human Rights and platform guidelines — e.g. prefer responses that are helpful, honest, and avoid harm. The model uses these principles to judge and improve its own outputs.

Question 4

Which models use Constitutional AI?

Accepted Answer

Anthropic's Claude models are trained with Constitutional AI. The technique was introduced by Anthropic in 2022 and has been refined in later model generations.

What Is Constitutional AI? Anthropic's Safety Approach Explained

What is Constitutional AI?

Why it was created

How it works — two stages

What is in the constitution?

Why it matters