Question 1

What is model pruning?

Accepted Answer

Model pruning is the process of removing parts of a trained neural network — individual weights or whole structures like neurons and attention heads — that contribute little to its predictions. The result is a smaller, sparser model that needs less memory and compute while keeping most of its accuracy.

Question 2

What is the difference between structured and unstructured pruning?

Accepted Answer

Unstructured pruning removes individual weights wherever they are, producing a sparse weight matrix that can be hard for standard hardware to speed up. Structured pruning removes whole units — entire neurons, channels, heads, or layers — which keeps the model dense and yields real speedups on ordinary GPUs and CPUs.

Question 3

What is the lottery ticket hypothesis?

Accepted Answer

The lottery ticket hypothesis proposes that a large randomly initialised network contains a small subnetwork — a winning ticket — that, if trained in isolation from the original initialisation, can match the full network's accuracy. It suggests much of a big model's capacity is redundant and that good sparse models exist inside dense ones.

Question 4

Do you have to retrain after pruning?

Accepted Answer

Usually yes. Removing weights immediately drops accuracy, so the standard workflow is to prune and then fine-tune, letting the remaining weights adjust to compensate. Iterating — prune a little, fine-tune, repeat — generally reaches a higher sparsity at a given accuracy than pruning everything in one shot.

What Is Model Pruning in AI?

What pruning does

Magnitude pruning

Structured versus unstructured pruning

The lottery ticket hypothesis

Practical pruning workflows for LLMs