Model Size Explained: 7B, 13B, 70B — What Do These Numbers Mean?

Billions of parameters decoded: capacity, memory, and the quality trade-offs

Ad placeholder (leaderboard)

What the number actually counts

When you see a model labelled 7B, 13B, or 70B, the number refers to its parameter count in billions. Parameters are the internal weights the model learned during training — the numbers that get multiplied together as text flows through the network. A 7B model holds about 7 billion of them; a 70B model holds about 70 billion. Loosely, more parameters mean more capacity to memorise facts and represent subtle patterns, which is why larger models tend to reason better and know more. But the relationship is not linear, and parameter count is only one ingredient in quality.

Why size drives memory and cost

Each parameter must be stored in memory to run the model. At the common 16-bit precision, every parameter takes 2 bytes, so a 7B model needs roughly 14 GB of memory for its weights alone, a 13B model about 26 GB, and a 70B model about 140 GB. On top of that, the context window consumes additional memory that grows with prompt length. This is why a 70B model usually demands multiple high-end GPUs while a 7B model can run on a single consumer card. Memory, not raw speed, is the first wall most people hit.

How quantisation shrinks the footprint

Quantisation stores each parameter using fewer bits — typically 8-bit or 4-bit instead of 16-bit — cutting memory two- to four-fold. A 7B model quantised to 4-bit drops from about 14 GB to roughly 4-5 GB, letting it run on a laptop GPU. The quality loss is usually small, especially at 8-bit, which is why nearly every locally-run model is quantised. The parameter count is unchanged; only the precision of each stored number is reduced.

Size versus actual capability

Bigger is not automatically better. Modern training techniques, cleaner data, and longer training let recent 7B and 13B models rival older 70B models on many benchmarks. What matters is the combination of parameters, training data quality, and fine-tuning. For narrow tasks — classification, entity extraction, routing, simple chat — a small model is often faster, cheaper, and good enough. For deep reasoning, broad world knowledge, or nuanced writing, the extra parameters of a large model start to pay off.

Choosing the right size

Match the model to the job. For on-device or low-latency work, start with a 7B model and only scale up if quality is insufficient. For general assistant tasks on a single modern GPU, a 13B model is a strong default. For demanding reasoning, coding, or research, reach for 70B or use a hosted frontier model where you pay per token instead of managing hardware. Always benchmark candidates on your own data: the smallest model that clears your quality bar is almost always the right economic choice.

Ad placeholder (rectangle)