Stage 11 — Agents

An agent is an LLM in a loop with tools. The model takes an action, observes the result, decides what to do next, and repeats until done. This loop is the basic unit of every “AI agent” you’ve seen — Claude Code, ChatGPT browsing, code-fixing bots, customer support agents, browser agents.

The architecture is simple. Making it reliable in production is not.

Prerequisites

Stage 08 (prompting, structured outputs)
Stage 09 (RAG, for retrieval-augmented agents)

Learning ladder

Agent loop & architecture — the core pattern
Tool use & function calling — how the model invokes external capabilities
Memory systems — working, episodic, semantic
Planning & reflection — ReAct, plan-and-execute, reflexion
Multi-agent orchestration — supervisor, swarm, debate
Guardrails & safety — keeping agents inside their lane
Browser & vision agents — the embodied frontier

MVU

You can:

Build a single-agent loop in <100 lines of code without a framework
Define tools with clean schemas and good error semantics
Articulate when to add a second agent vs more tools to one
Prevent the most common failure modes (loops, drift, runaway cost)

Exercise

Build an agent that can search the web, read pages, and answer multi-hop questions. No agent framework allowed for the first version. Then add: a tool registry, basic memory (summarize old turns), retry logic, a budget cap. Then ask: would a framework actually help me here?

Why this stage matters

In 2026, “agents” is what most product teams want to ship. Most of them ship something that works in demos but fails in production. The difference is in this stage’s content: tool design, memory management, error handling, evaluation.

Hands-on companions

This stage has the most code-side companion content on the site. After the theory:

Ship the agent stack:

/ship/09 — tools and function calling — tool registry, JSON-schema from type hints, OSS-model adapters
/ship/10 — build the agent loop — three-axis budgets, history pruning, named failure modes (thrashing, premature giving-up, format drift)
/ship/11 — multi-agent orchestration — supervisor / workers / critic, plus an honest “skip the orchestrator” flowchart

See it as a real product — three case studies, increasing in complexity:

/case-studies/02 — code-review agent — propose-then-act tools, action-rate as the metric, when not to comment
/case-studies/03 — research assistant — multi-agent fan-out, real cost/latency benchmark, synthesis-not-concatenation
/case-studies/04 — customer-support bot — RAG + tools + escalation logic; the product that composes everything