What a multi-agent system really is
A multi-agent AI system splits a goal across several specialised agents that coordinate to solve it. Each agent is just an LLM with a focused role, its own prompt, and a limited set of tools — but together, orchestrated well, they handle work that overwhelms a single prompt. The pattern is not magic; it is classic software decomposition applied to model calls. The risk is that more agents means more latency, cost, and ways to fail, so the discipline is to add agents only where a real boundary exists.
The orchestrator–worker pattern
The most reliable architecture is orchestrator plus workers. A planner agent (the orchestrator) receives the goal and decomposes it into sub-tasks, then routes each to a specialist worker — a researcher, a coder, a critic, a summariser. Crucially, the orchestrator coordinates but does not do the domain work; the workers execute but do not decide the overall plan. This separation is what makes the system inspectable: you can read the plan, see which worker ran each step, and pinpoint where things went wrong.
Sub-tasks that don’t depend on each other can run in parallel, which is one of the main performance wins of multi-agent design. The orchestrator then gathers the workers’ outputs and either synthesises a final answer itself or hands them to a dedicated aggregator agent.
Memory and shared state
Agents need to share results without drowning each other in context. The antipattern is one giant, ever-growing conversation that every agent reads — it bloats prompts and costs and quickly confuses the models. The fix is explicit shared state: a central store (a structured object, a key-value store, or a small database acting as a “blackboard”) where agents write intermediate results under named keys and read only what they need.
Distinguish short-term state (the current task’s working memory, discarded when the run ends) from long-term memory (facts persisted across runs, often in a vector store for semantic recall). Give each worker the minimum context for its job — the relevant slice of state plus its instructions — not the whole history. Small, targeted prompts are cheaper, faster, and more accurate.
Messaging, failure recovery, and limits
Agents communicate through structured messages, not free text — a typed envelope with a sender, a recipient, a task, and a payload. Structured messaging makes routing deterministic and lets you log and replay the conversation.
Because any model call can fail, return garbage, or loop, failure recovery is a first-class concern, not an afterthought:
- Validate every handoff. Check a worker’s output against an expected schema before passing it on; on a malformed result, retry with feedback or escalate.
- Add a critic. A reviewer agent that inspects work before it’s accepted catches errors the producer can’t see — at the cost of extra calls.
- Cap everything. Enforce max steps per task, max tool calls per agent, a total token or spend budget, and a wall-clock timeout. These limits are what stop a polite infinite loop from emptying your account.
- Degrade gracefully. Define a safe fallback — return partial results with a clear note rather than crashing — and log every step with a run ID so you can trace and improve failures later.
Build the smallest system that works, instrument it heavily, and add agents only when measurement shows a single one can’t keep up. Frameworks like CrewAI, AutoGen, and LangGraph implement these patterns for you and are the fastest way to a working system; reach for custom code when you need control they don’t give.