What Is a Large Language Model (LLM)?

The technology behind ChatGPT, Claude, and Gemini—simply explained

Ad placeholder (leaderboard)

A plain-English definition

A large language model is a neural network trained on a vast amount of text to predict and generate language. Give it some words and it produces the words it judges most likely to come next, one chunk at a time, until it has written a response. That simple-sounding mechanism, applied at enormous scale, is what powers ChatGPT, Claude, Gemini, and their peers. The model is not a database it looks things up in, and not a program following hand-written rules — it is a statistical engine that has absorbed patterns of language and knowledge from its training data and reproduces them on demand.

What the “large” really means

The word “large” is literal and refers to two quantities. The first is parameters — the internal numbers the model adjusts during training, now counted in the billions or hundreds of billions. The second is training data — typically hundreds of billions to trillions of words drawn from books, websites, code, and articles. Researchers found that scaling both up did not just make models incrementally better; past certain thresholds, new abilities emerged — coherent multi-step reasoning, following novel instructions, translating between languages it was never explicitly taught to translate. This scaling effect is why the field pursued ever-bigger models and why “large” became the defining adjective.

How LLMs are trained

Training happens in stages. The dominant first stage is next-token prediction: the model reads massive text and repeatedly tries to predict the next token (a word or word-piece), getting corrected when wrong, billions of times. To predict the next word across all of human writing, it must implicitly learn grammar, facts, style, and reasoning patterns — knowledge ends up encoded in its parameters as a side effect of getting good at prediction. A second stage, fine-tuning — often with reinforcement learning from human feedback — teaches the raw model to follow instructions, stay helpful, and avoid harmful output. The chatbot you talk to is the product of both stages.

What LLMs can and cannot do

LLMs are remarkably capable at language-centred tasks: drafting and editing text, summarising, translating, answering questions, writing and explaining code, and reasoning through problems step by step. But their generative nature has a built-in risk. Because they produce likely text rather than verified text, they can hallucinate — state false information with complete fluency and confidence. They also have a fixed knowledge cutoff and no inherent access to live data unless a tool provides it. The practical rule is to treat an LLM as a brilliant, fast, but occasionally unreliable assistant: superb for drafting and exploration, and always to be fact-checked where accuracy matters.

How they differ from older NLP

Before LLMs, natural language processing meant building a separate, narrowly trained model for each task — one for sentiment, another for translation, another for entity extraction — each requiring its own labelled dataset. LLMs collapsed this into a single general-purpose model you steer with plain-language instructions. Ask it to translate, summarise, or classify and it simply does, no task-specific training required. That generality, unlocked purely by scale and the next-token objective, is the leap that separates today’s models from the NLP of a decade ago.

Ad placeholder (rectangle)