What a parameter is
A parameter is simply a number that an AI model learned during training. A modern model is, under the hood, a vast collection of these numbers — most of them weights, with some biases. When people say a model has “7 billion parameters,” they mean it contains about seven billion such learned values. Together, these numbers encode everything the model knows: the patterns in language, facts, and skills it picked up from its training data.
Weights and biases
Inside a neural network, information flows through layers of artificial neurons:
- Weights decide how strongly one neuron’s output influences the next. Most parameters are weights.
- Biases shift a neuron’s output up or down, giving the model extra flexibility.
During training, the model repeatedly adjusts these values to reduce its prediction error. Learning is the gradual tuning of millions or billions of parameters until the model’s outputs match the desired patterns in the data.
What parameter count tells you
Parameter count — the “7B”, “13B”, “70B” you see in model names (B = billion) — is a rough proxy for a model’s capacity:
- More parameters → more room to store knowledge and represent subtle patterns, but more memory and compute to train and run.
- Fewer parameters → cheaper and faster, but potentially less capable on complex tasks.
It is a useful first-glance signal of how powerful and how heavy a model is, but it is not the whole story.
Why bigger is not always better
Scaling up parameters helps — up to a point — but the benefits diminish, and size is only one factor. A smaller model trained on more and cleaner data, or with a better architecture and good fine-tuning, can outperform a larger one. The field has repeatedly shown that data quality, training method, and alignment matter as much as raw size. So a well-trained 7B model can be more useful for many tasks than a poorly trained 70B model.
Parameters and your hardware
Because every parameter occupies memory, parameter count largely determines the hardware you need. At 16-bit precision, each parameter takes two bytes, so a 70B model needs roughly 140 GB just to hold its weights, while a 7B model needs about 14 GB. This is exactly why quantization exists — by storing parameters in 8- or 4-bit form, you can shrink that footprint several-fold and fit large models on ordinary hardware. Understanding parameters, then, is the key to predicting both a model’s capability and what it will take to run it. For the bigger picture of how these models work, see What Is a Large Language Model?.