The first half of 2024 was unusually busy for AI vocabulary. Multimodal models became standard, autonomous agents moved from demos to products, and efficiency techniques that had lived in research papers showed up in shipping systems. This glossary captures the terms that crossed into mainstream coverage during that window, with definitions written for people who need to understand the news rather than build the models.
Agentic AI and AI agents
By mid-2024 the word agent had shifted meaning. An AI agent is a system built around a language model that can plan a task, decide which tools to call (search, code execution, APIs), act, observe the result, and loop until the goal is met. Agentic describes this autonomous, multi-step behaviour. The distinction from a chatbot matters: a chatbot responds to one prompt at a time, while an agent pursues an objective across many steps with minimal human intervention. Frameworks like LangChain, AutoGen, and CrewAI popularised the pattern.
Multimodality and omni models
A multimodal model accepts and produces more than one type of data — text, images, audio, and sometimes video. In 2024 this became the default expectation rather than a feature. OpenAI’s GPT-4o (the “o” stands for omni) processed text, vision, and audio in a single model, while Google’s Gemini 1.5 emphasised long-context multimodal understanding. The trend turned “can it see the image?” into a baseline question for any serious model.
Mixture of experts (MoE)
Mixture of experts is an architecture where a model contains many specialised sub-networks (“experts”) and a lightweight router sends each token to only a small subset. The effect is a model with very high total parameter count but much lower compute per query, because most experts stay idle. Mixtral and several frontier models leaned on MoE to deliver strong quality at lower inference cost, making the term common in 2024 launch announcements.
Context window expansion
A context window is how much text a model can consider at once, measured in tokens. Through 2024 these limits grew dramatically — Gemini 1.5 advertised windows of up to a million tokens. Bigger windows let models read entire books, codebases, or long transcripts in one pass, reducing the need for chunking and retrieval in some workflows. “Long context” became a key marketing axis.
Inference-time compute and reasoning
Late in the period, attention turned to spending more compute at answer time so models could reason through hard problems step by step, rather than relying solely on what they learned in training. This idea — letting a model “think longer” before responding — set the stage for the dedicated reasoning models that followed. It reframed quality as something you can buy with extra runtime, not just bigger training runs.
Treat any trend glossary as a snapshot. Terms here were current in mid-2024; verify exact capabilities and pricing against primary sources before relying on them, because the field moves faster than any single page can.