Question 1

Which model is the safest?

Accepted Answer

There is no single safest model, because safety has two competing dimensions: refusing genuinely harmful requests, and not over-refusing benign ones. A model that refuses everything is safe but useless. The leading models all perform well on clear-cut harmful requests and differ mainly in how they handle ambiguous edge cases.

Question 2

What is over-refusal and why does it matter?

Accepted Answer

Over-refusal is when a model declines a perfectly legitimate request because it pattern-matches to something risky — refusing to explain how a medication works, or to write fiction involving conflict. It matters because excessive refusals frustrate users and push them toward less-safe tools, so labs actively try to reduce it without lowering protection on real harms.

Question 3

Do safety behaviours change between model versions?

Accepted Answer

Yes, frequently. Each model update retunes refusal thresholds, often loosening over-cautious behaviour while tightening genuine gaps. Any specific comparison is a snapshot in time, so the patterns matter more than exact verdicts: look at how a model reasons about a request, not just whether it said yes or no on one day.

Question 4

How do labs measure safety?

Accepted Answer

With standardized batteries of prompts spanning clearly harmful, benign-but-sensitive, and ambiguous categories, scored for correct refusal, over-refusal, and consistency. They publish results in model and system cards, run internal and external red-teaming, and track refusal rates in real usage. No single number captures it — it is always a balance of metrics.

AI Safety Comparison: How ChatGPT, Claude, and Gemini Handle Harmful Requests

Safety is a balance, not a single score

How the three leading models compare

The over-refusal problem

How to read safety claims