Multi-Agent Orchestration
When one agent isn’t enough, multi-agent systems coordinate several specialists. The patterns range from trivial (a router calls one of N agents) to elaborate (debating critics, hierarchical teams). Most production multi-agent systems are simpler than they look.
When to go multi-agent
Strong signals for multi-agent:
- Distinct specializations: one agent does retrieval, another does code, another does writing.
- Different access scopes: a “search” agent can’t access “purchase” tools.
- Parallelism: independent sub-tasks can run concurrently.
- Different model classes: a cheap router decides; an expensive specialist executes.
- Different system prompts that conflict if combined.
Signals you don’t need multi-agent (just more tools / better single agent):
- Fewer than ~10 tools.
- One coherent personality.
- Mostly sequential workflow.
- Latency is critical (each agent hop adds latency).
Pattern 1 — Router / orchestrator
A single coordinating agent decides which specialist to invoke for each subtask:
User → Router agent (cheap, broad) → picks specialist → specialist executes → returns
↑__________________________________|
loops until done
Specialists are themselves agents (or simple LLM calls). The router:
- Sees the user request.
- Picks a specialist.
- Waits for the result.
- Decides next step (call another, finalize).
Used by Cursor (router → code agent), Claude’s project-routing patterns, support-ticket triage systems.
Pattern 2 — Pipeline
Fixed sequence: agent A → agent B → agent C. Each does its part; passes to the next.
Researcher agent → Writer agent → Editor agent → Final draft
Pros:
- Simple to reason about.
- Each agent has a focused job.
Cons:
- Hard to recover from mid-pipeline failures.
- Inflexible — each task must fit the pipeline shape.
Common in content-generation systems.
Pattern 3 — Supervisor / worker
A supervisor decomposes the task into independent worker assignments, runs them (often in parallel), aggregates:
Supervisor: Decomposes → spawns N workers → aggregates results
↓
Worker 1, Worker 2, ..., Worker N (parallel)
Used heavily in research agents and large code-modification agents. The supervisor is often the only agent that sees the full context; workers operate on focused subtasks with smaller context windows.
This is exactly Claude Code’s pattern when it spawns sub-agents.
Pattern 4 — Debate / critic
Two or more agents take adversarial positions; their disagreement surfaces problems:
Proposer: Here's a solution X.
Critic: But X has these flaws.
Proposer: Revised: X'.
Critic: X' addresses Y but not Z.
...
Judge: Verdict — accept X' with caveat Z.
Used for high-stakes decisions, code review, factual correctness. Costs more than single-agent but catches errors that don’t surface with one perspective.
Pattern 5 — Swarm / consensus
Many agents, each tries the task independently. Results aggregated via voting, majority, ranking.
Equivalent to self-consistency at the agent level. Expensive; sometimes worth it for hard problems with verifiable rewards.
Pattern 6 — Hierarchy
Recursive: an agent at level N can spawn agents at level N+1, who can spawn level N+2, etc.
Top-level coordinator
↓ delegates research subtask
Research lead
↓ delegates fact-finding
Fact-finder, fact-finder, fact-finder
Mirrors human organizational structures. Useful for very large tasks; risky operationally (depth = cost = error compounding).
Pattern 7 — Two-tier model routing
Cheap-then-expensive:
Haiku 4.5 (cheap, fast) → handles 80% of queries
↓ on uncertainty / hard question
Sonnet 4.6 (capable, slower) → handles 20%
Strictly speaking not “multi-agent” — same agent loop, model swap. But operationally similar. Saves significant cost.
Communication between agents
How agents talk to each other:
Structured messages
Agent A’s output is parsed JSON; agent B’s input is the parsed JSON:
# A produces:
{"task": "summarize", "input": "...", "constraints": {...}}
# B consumes that schema directly.
Reliable, schema-validated, debuggable.
Shared blackboard / memory
Agents read/write a common state object:
state = {
"task": "...",
"research": [...],
"draft": "...",
"review_comments": [],
}
Each agent updates the shared state. The orchestrator decides who runs next based on state.
LangGraph follows this model; many production systems do.
Handoff messages
One agent emits a message to another; framework routes it:
agent_A.send(to="agent_B", content="...")
Clean for multi-step protocols; can complicate debugging if not logged carefully.
Frameworks
- LangGraph: explicit graph of states/transitions; popular.
- AutoGen (Microsoft): general-purpose multi-agent framework.
- CrewAI: role-based (“you are the researcher”, “you are the writer”).
- OpenAI Assistants / Agents SDK: thin layer over OpenAI; multi-agent via assistants.
- Anthropic Agent SDK: Claude-native, single + multi-agent patterns.
You can also build all of this with raw Python loops + tool calls. Frameworks add ergonomics, not capabilities.
Cost & latency
Each agent hop is at least one LLM call. A 5-agent pipeline = 5+ LLM calls. Costs and latency add up:
- Cache shared context across agents.
- Use cheap models for routers/aggregators.
- Run independent agents in parallel.
- Cap iterations on debate / consensus patterns.
Failure modes specific to multi-agent
- Telephone-game errors: information loss as it passes through agents.
- Orphaned tasks: a worker’s result is dropped; the orchestrator forgets.
- Conflicting outputs: two agents produce contradictory results.
- Deadlock: agent A waits for B, B waits for A.
- Silent context loss: agent B doesn’t see what A saw, makes a wrong decision.
- Cost runaway: each agent thinks others will be cheap; collectively expensive.
When multi-agent isn’t worth it
A common anti-pattern: building a 5-agent system where a single agent with the same tools would do better.
The test: take the multi-agent system, flatten it into one agent with all the tools. If the single agent performs comparably, you didn’t need multi-agent.
You do need multi-agent if:
- The agents must have different system prompts that conflict.
- Different agents need different access (security boundaries).
- Real parallelism saves wall-clock time.
Otherwise, prefer one capable agent over many narrow ones.
Observability
Multi-agent traces are harder to read than single-agent. Invest early:
- Log every inter-agent message.
- Tag traces by parent agent.
- Visualize agent graphs in your tracing tool (Langfuse, Phoenix, custom).
- Replay sessions for debugging.
Watch it interactively
- Multi-Agent — supervisor + 3 workers + critic with failure-injection toggle (transient error / timeout) and retry-budget slider. Predict before clicking: with retry=0 and one worker failing, the final answer is visibly degraded (”⚠ no food recommendations — Food worker failed and supervisor’s retry budget was 0”); flip retry to 1 and the same run produces the full answer. The integration prompt panel shows what the supervisor literally sends back to the LLM.
- Agent Trace Viewer — single-agent traces with failure injection. Multi-agent is just nested versions of this loop.
Build it in code
/ship/11— multi-agent orchestration —Supervisor/Worker/Critic/Orchestratorin ~200 lines. Includes the honest “skip the orchestrator” flowchart./case-studies/03— research assistant — multi-agent fan-out for cited briefs, with the cost/latency math (3× tokens, 2× wall-clock savings, +0.8 quality lift) measured on a real query.