Question 1

What is knowledge distillation in simple terms?

Accepted Answer

Knowledge distillation is a technique where a large, capable 'teacher' model trains a smaller 'student' model to mimic its behaviour. Instead of learning only from raw labelled data, the student learns from the teacher's full output distribution, capturing nuance the teacher discovered. The result is a much smaller model that retains a surprising amount of the teacher's ability.

Question 2

Why are 'soft targets' important?

Accepted Answer

Soft targets are the teacher's full probability distribution over possible answers, not just the single correct label. They carry 'dark knowledge' — for example, that a given image is mostly a dog but slightly cat-like — which tells the student how the teacher reasons about similarity. Learning from these richer signals lets a small student generalise far better than training on hard labels alone.

Question 3

How is distillation different from pruning or quantization?

Accepted Answer

All three shrink models but work differently. Distillation trains a new, smaller model to imitate a larger one. Pruning removes weights or neurons judged unimportant from an existing model. Quantization reduces the numerical precision of weights (for example from 16-bit to 4-bit) to cut memory and speed up inference. They are complementary and are often combined.

Question 4

Do distilled models lose quality?

Accepted Answer

Usually some, but often less than their size suggests. A well-distilled student can recover most of the teacher's performance on the tasks it was distilled for while being a fraction of the size and cost. The trade-off is that students tend to be narrower — strong where the distillation focused, and weaker on capabilities outside that distribution.

Knowledge Distillation in AI: How Smaller Models Learn from Larger Ones

The problem distillation solves

Teacher and student: how the training works

Soft targets and “dark knowledge”

Where distillation shows up

Distillation vs other compression methods