How to use this glossary
AI moves fast and its vocabulary moves with it. This reference defines the terms you will actually meet in documentation, tutorials, model cards, and papers — concise, accurate, and grouped roughly from foundations to advanced. You do not need to memorise it; bookmark it and look words up as they come. Master the core dozen — token, parameter, prompt, context window, embedding, fine-tuning, RAG, hallucination, temperature, inference, transformer, attention — and most everyday AI discussion becomes readable.
Foundations
Artificial intelligence (AI) — the broad field of building systems that perform tasks requiring intelligence. Machine learning (ML) — a subset where systems learn patterns from data instead of being explicitly programmed. Deep learning — ML using many-layered neural networks; the engine behind modern AI. Neural network — a model of interconnected nodes (“neurons”) whose connection strengths (weights) are tuned during training. Training — the process of adjusting a model’s weights to fit data. Inference — running a trained model to get an output. Parameter — one learned weight; model size is measured in parameters (millions to trillions). Dataset — the collection of examples a model learns from.
Language model essentials
Large language model (LLM) — a neural network trained on vast text to predict the next token. Token — a chunk of text (often a word-piece) that the model reads and writes; the unit of pricing and context. Tokenisation — splitting text into tokens. Prompt — the input text you give the model. System prompt — hidden instructions that set the assistant’s persona, rules, and tone. Context window — the maximum number of tokens a model can consider at once. Temperature — a setting controlling randomness; low is focused and deterministic, high is creative and varied. Top-p / nucleus sampling — an alternative randomness control that samples from the most probable tokens. Embedding — a numeric vector capturing the meaning of text, so similar meanings sit close together.
Architecture and training terms
Transformer — the dominant neural architecture behind modern LLMs, built on attention. Attention — the mechanism that lets a model weigh which tokens matter for each prediction. Autoregressive — generating one token at a time, left to right, each conditioned on the last. Pre-training — the initial, broad training pass on general data. Fine-tuning — further training on a narrow dataset to specialise behaviour. LoRA — an efficient fine-tuning method that adjusts only a small set of added weights. RLHF — reinforcement learning from human feedback, used to align model behaviour with human preferences. Mixture of experts (MoE) — an architecture that routes each input to a few specialised sub-networks for efficiency.
Application and behaviour terms
RAG (retrieval-augmented generation) — fetching relevant documents and feeding them to the model so answers are grounded in real sources. Hallucination — a confident, fluent output that is factually wrong. Grounding — tying outputs to verifiable sources. Zero-shot / few-shot — prompting with no examples versus a handful of examples. Chain of thought — prompting the model to reason step by step before answering. In-context learning — the model adapting to examples in the prompt without any weight updates. Function calling / tool use — letting the model emit structured calls to external APIs. Multimodal — handling more than text, such as images, audio, or video. Agent — an AI system that plans and takes multi-step actions toward a goal. Quantisation — shrinking a model by storing weights at lower precision to save memory and speed up inference.