What an LLM is
A large language model (LLM) is a neural network — in practice almost always a transformer — trained on an enormous corpus of text to predict the next unit of text given everything before it. The word “large” carries two meanings at once: the model is trained on a vast quantity of data (often a sizeable slice of the public web plus books and code), and the model itself is large, with parameter counts ranging from hundreds of millions to hundreds of billions or more. ChatGPT, Claude, Gemini, and Llama are all LLMs.
How it is trained
The dominant training method is next-token prediction, a form of self-supervised learning. The model is shown a passage of text with the continuation hidden, asked to predict what comes next, and nudged toward the correct token. Repeated across trillions of tokens, this single objective forces the model to internalise grammar, facts, writing styles, and a surprising amount of reasoning, because predicting the next word well in arbitrary text requires modelling the patterns behind it. No human writes explicit rules; the knowledge is learned implicitly from the data.
Why scale matters: emergent capabilities
LLMs became interesting because of scale. As models grew in parameters, data, and compute, they did not merely get a little better — certain abilities appeared abruptly once a size threshold was crossed. These are called emergent capabilities: skills that are near-random in smaller models and then jump sharply, such as multi-step arithmetic, following instructions given as a few examples (few-shot learning), and step-by-step reasoning. This is why the field invested so heavily in making models larger: capability gains were not always predictable from small-scale experiments.
What LLMs can and cannot do
LLMs are remarkably general. A single model can summarise documents, draft emails, write and explain code, translate, answer questions, and brainstorm — tasks that once needed separate specialised systems. That generality is their defining commercial advantage.
But an LLM is fundamentally a next-token predictor, not a reasoning engine with a model of the world. It has no grounded experience, no persistent goals, and no built-in notion of truth. It can hallucinate — produce fluent, confident, and entirely false statements — because plausible-sounding text and true text look similar to a pattern matcher. It is also frozen at its training cutoff unless connected to live tools or retrieval. The practical takeaway: treat an LLM as a powerful drafting and reasoning assistant whose output you verify, not as an oracle.
Where the term sits
“LLM” is the umbrella term for the model class. Related entries describe the pieces: parameters are the learned weights inside the model, next-token prediction is the training objective, and the transformer is the architecture nearly all modern LLMs use. Understanding the LLM as “a very large transformer trained to predict text” is the foundation for everything else in modern AI.