Weight (AI Glossary)

A single learnable number in a neural network that gets updated during training

Ad placeholder (leaderboard)

What a weight is

A weight is a single learnable number inside a neural network. Every connection between two artificial neurons has one weight attached to it. When a value flows along that connection, the network multiplies it by the weight: a large positive weight amplifies the signal, a weight near zero suppresses it, and a negative weight flips its sign. A modern language model contains billions of these numbers, and together they are what the model has learned.

Weights and biases

Each neuron computes a weighted sum of its inputs and then adds one more learnable number called a bias. The bias shifts the result up or down regardless of the inputs, letting a neuron activate even when its inputs are small. Weights and biases are both parameters — the adjustable knobs of the network — and when people say a model “has 7 billion parameters,” they are counting all of its weights and biases combined.

How weights are initialised

Before training begins, the weights have to start somewhere. They are usually set to small random values drawn from a carefully scaled distribution (for example Xavier or He initialisation). Random initialisation breaks symmetry so that different neurons learn different things, and the careful scaling keeps the signals from exploding or vanishing as they pass through many layers. Biases are often initialised to zero.

How weights get updated

Training is the process of finding good values for every weight:

  1. Forward pass — feed an input through the network and produce a prediction.
  2. Loss — measure how wrong the prediction is compared to the correct answer.
  3. Backpropagation — work backwards to compute how much each weight contributed to that error (its gradient).
  4. Gradient descent — nudge every weight a small step in the direction that reduces the loss.

Repeated over millions of examples, this gradually sculpts the weights so the model’s predictions improve. The size of each step is controlled by the learning rate.

Frozen weights and fine-tuning

When you fine-tune a pre-trained model, you usually do not retrain every weight. Instead you can freeze most of them — hold them fixed so gradient descent skips them — and only update a small set of layers (or lightweight adapters such as LoRA). Freezing preserves the broad knowledge already baked into the model, costs far less compute, and reduces the risk of catastrophic forgetting, where retraining everything wipes out useful prior learning.

In short, weights are the memory of a neural network: random at birth, shaped by gradient descent during training, and selectively frozen when you adapt a model to a new task.

Ad placeholder (rectangle)