Question 1

What makes a language model 'large'?

Accepted Answer

Two things: the number of parameters (the internal numbers it learns, now in the billions) and the size of the training data (typically hundreds of billions to trillions of words). Scaling both up produced a qualitative jump in capability over earlier, smaller models — the 'large' is literal, not marketing.

Question 2

How is an LLM trained?

Accepted Answer

Primarily by next-token prediction: the model reads enormous amounts of text and repeatedly learns to predict the next chunk of text given the preceding chunk. Through this single objective at massive scale it absorbs grammar, facts, and reasoning patterns. A second stage, fine-tuning with human feedback, makes it follow instructions and behave helpfully and safely.

Question 3

Do LLMs know facts or just predict words?

Accepted Answer

Both, in a sense. An LLM stores statistical knowledge from its training data in its parameters, so it can recall many facts. But it generates output by predicting likely text, not by looking things up, so it can produce fluent, confident statements that are false — called hallucinations. Always verify important facts from an authoritative source.

Question 4

How are LLMs different from older NLP systems?

Accepted Answer

Earlier natural language processing relied on hand-built rules or narrow models trained for one task — sentiment, translation, or named-entity recognition. LLMs are general-purpose: a single model handles many tasks via natural-language instructions, with no task-specific training, because scale gave them broad in-context flexibility older systems lacked.

What Is a Large Language Model (LLM)?

A plain-English definition

What the “large” really means

How LLMs are trained

What LLMs can and cannot do

How they differ from older NLP