Parameters (AI Glossary)

The learned weights and biases that define what an AI model knows

Ad placeholder (leaderboard)

What parameters are

Parameters are the numbers inside a neural network that the model learns during training. The vast majority are weights — values that scale the connections between units — together with a smaller number of biases that offset each unit’s output. When people say a model has “7 billion parameters” or “405 billion parameters,” they are counting these adjustable numbers. Parameters are where a trained model’s knowledge lives: the patterns, facts, and skills the model has acquired are encoded entirely in the specific values its parameters settled on after training.

How parameters store knowledge

A neural network turns input into output through a long chain of multiply-and-add operations, and the parameters are the multipliers. During training, the model makes a prediction, measures how wrong it was, and adjusts its parameters slightly to reduce that error — repeating billions of times. Over the course of training, the parameters drift to values that capture the regularities of the data. There is no human-readable database of facts inside the model; the “knowledge” is the distributed pattern of millions or billions of numbers, compressed so that the right output emerges from the right input.

Parameters versus hyperparameters

It is easy to confuse two similar-sounding terms.

  • Parameters are learned by the model during training — the weights and biases.
  • Hyperparameters are set by humans and govern the training process itself: the learning rate, batch size, number of layers, number of training steps, and so on.

A useful way to remember it: parameters are the model’s answers, hyperparameters are the exam conditions. You tune hyperparameters to make training produce good parameters.

Parameter count and capability

For years, bigger meant better: increasing parameter counts unlocked new capabilities and lower error. Parameter count remains a rough proxy for a model’s capacity to learn. But it is not the whole story. Data quality and quantity, the architecture, and the training recipe all shape final performance, and a smaller model trained on better data with a better method can outperform a larger one. The field has moved from “scale parameters at all costs” toward balancing parameters with data and efficiency.

Why parameter count drives cost

Every parameter is a number that must be stored in memory and used in computation. So parameter count strongly determines a model’s memory footprint and the compute cost per prediction. A larger model needs more and bigger hardware and costs more to serve at scale. This economic pressure is exactly why architectures like Mixture of Experts aim to raise total parameters (capacity) while keeping active parameters per token (cost) low — and why techniques such as quantisation shrink the storage each parameter needs.

Ad placeholder (rectangle)