Question 1

What does the B in 7B or 70B mean?

Accepted Answer

The B stands for billion, and the number counts the model's trainable parameters — the internal weights it adjusts during training. A 7B model has roughly 7 billion of these numbers, a 70B model has about ten times more. More parameters generally means more capacity to store patterns and knowledge, but also more memory and compute to run.

Question 2

How much VRAM does a 7B model need?

Accepted Answer

At full 16-bit precision a 7B model needs about 14 GB of VRAM just for the weights, plus extra for context. Quantised to 4-bit it can fit in roughly 4-5 GB, which is why quantisation is so popular for consumer GPUs. A 70B model needs about 140 GB at 16-bit or around 40 GB at 4-bit, usually requiring multiple GPUs.

Question 3

Is a bigger model always better?

Accepted Answer

No. Bigger models are generally more capable on hard reasoning and broad knowledge, but they are slower, costlier, and overkill for simple tasks. A well-tuned 7B or 13B model can match or beat an older 70B model, and a small model is often the right choice for classification, extraction, or latency-sensitive work.

Question 4

What is quantisation and why does it matter for size?

Accepted Answer

Quantisation stores each parameter using fewer bits — for example 4-bit instead of 16-bit — cutting memory roughly fourfold with only a small quality loss. It is the main reason large models can run on modest hardware. The parameter count stays the same; only the precision of each number changes.

Model Size Explained: 7B, 13B, 70B — What Do These Numbers Mean?

What the number actually counts

Why size drives memory and cost

How quantisation shrinks the footprint

Size versus actual capability

Choosing the right size