Agent Loop & Architecture

An agent is an LLM in a loop. Strip away frameworks and the pattern is small enough to fit on a napkin.

The core loop

loop:
    response = llm(messages)
    if response is final answer:
        return response
    if response is tool call:
        result = execute(tool_call)
        messages.append(tool_call)
        messages.append(result)
        continue
    raise UnexpectedResponse

That’s an agent. Six lines of pseudo-code, and roughly ~50 lines of real code. Everything else — frameworks, abstractions, multi-agent systems — is built on this.

A minimal agent

import anthropic

client = anthropic.Anthropic()

def run_agent(task: str, tools: list[dict], handlers: dict[str, callable], max_steps: int = 20):
    messages = [{"role": "user", "content": task}]

    for step in range(max_steps):
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=4096,
            tools=tools,
            messages=messages,
        )

        if response.stop_reason == "end_turn":
            return extract_text(response)

        if response.stop_reason == "tool_use":
            messages.append({"role": "assistant", "content": response.content})

            tool_results = []
            for block in response.content:
                if block.type != "tool_use":
                    continue
                try:
                    result = handlers[block.name](**block.input)
                except Exception as e:
                    result = f"Error: {e}"
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": str(result),
                })

            messages.append({"role": "user", "content": tool_results})
            continue

        raise RuntimeError(f"Unexpected stop_reason: {response.stop_reason}")

    raise RuntimeError("Agent exceeded max_steps")

Define a few tools, supply handlers, and you have a working agent.

The four ingredients

1. The model

Has to be capable of using tools (modern frontier models all do this natively).

2. The tool definitions

A list of available tools, each with:

A name (unique)
A description (what it does, when to use)
An input schema (JSON Schema)

3. The handlers

Functions that actually execute when the model calls a tool.

4. The loop

Drives the conversation forward until the model emits an end-turn response.

Why this works

Modern LLMs are trained to:

Decide whether to call a tool or answer directly.
Format tool calls as structured JSON matching your schema.
Reason about tool results and continue.

You don’t need to parse “Action: Search\nObservation: …” text — that was the 2022 ReAct era. Native tool use is cleaner, more reliable, and supported by every major API in 2026.

What can go wrong

The model loops

Calls the same tool repeatedly with similar inputs. Common cause: the tool’s response isn’t progressing the task.

Fixes:

Cap iterations (max_steps).
Detect repetition; inject a system message (“you’ve already tried that”).
Improve tool error messages so the model can self-correct.

The model gives up too easily

Returns “I can’t help with that” when it should try a tool.

Fixes:

Better tool descriptions (“Use this when …”).
Few-shot examples in the system prompt.
Provide a fallback tool (“escalate to human”) so the model has somewhere to go.

The model hallucinates tool calls

Calls non-existent tools, or invents fields.

Fixes:

Strict JSON schema (enforced by the API).
Validate tool calls before executing; return clear error to the model.
Use tool-trained models; older or smaller models hallucinate more.

Runaway cost

A single user task can rack up many tool calls and many tokens. Without limits, your bill scales unpredictably.

Fixes:

Per-task token budget.
Per-task tool-call budget.
Per-task wall-clock budget.
Logging + alerting on outliers.

Stale context

After 30 turns, the agent’s context is full of old tool results. Quality degrades.

Fixes:

Compaction: summarize old turns periodically.
Selective pruning: drop verbose tool outputs once they’ve been used.
External memory: store details outside context, retrieve as needed.

We unpack this in memory-systems.md.

Single-agent vs multi-agent

A single agent with N tools is usually simpler than N agents with one tool each. Multi-agent makes sense when:

Different agents need very different system prompts (a writer vs an editor).
You want parallelism (one agent dispatches sub-tasks).
You’re enforcing different access scopes (a “search” agent that can’t access “purchase” tools).
Latency matters and you can fan out.

Otherwise, start with a single agent. Add agents only when one isn’t enough.

Streaming

Stream the model’s output for real-time UX:

Display partial responses as they arrive.
Surface tool calls before they execute.
Show tool results as they come in.

Most APIs support streaming. The agent loop logic is similar; you just process the stream incrementally.

Long-running agents

For tasks that take minutes to hours:

Persist agent state to disk/DB so it survives crashes.
Implement checkpointing — restart from any step.
Surface progress to the user (heartbeats, progress bars).
Add cancellation paths (don’t let it run forever).

Frameworks

When are they helpful?

Anthropic Agent SDK / OpenAI Assistants / Vertex Agents: thin wrappers over the API; useful for quick start.
LangGraph: explicit state machines, good for complex routing.
LlamaIndex agents: tightly integrated with their RAG stack.
smolagents (HuggingFace): code-execution-first agents, lightweight.
CrewAI, AutoGen: multi-agent orchestration.

Don’t reach for a framework before you’ve built the loop yourself. Once you have, you’ll know what abstractions you actually need (and which ones the framework gets wrong for your use case).

State of the art

By early 2026, the best agents:

Use a single capable model (frontier with tool use + reasoning).
Have 5–20 well-designed tools (not 100 tools — too much choice paralysis).
Maintain active memory with summarization and retrieval.
Have reliable termination criteria (the model knows when to stop).
Run parallel sub-tasks when independent.
Are observable with full traces of every decision.

This is the architecture behind Claude Code, Cursor, and many top coding/research agents.

Watch it interactively

Agent Trace Viewer — full agent loops with thought / action / observation / final answer typed-step rendering. Toggle the failure-injection mode (none / transient error / permanent error) and watch how retry/give-up reshapes the trace. The recovery pattern is on screen.
ReAct Lab — toggle reflection on/off on the same run, including a real failing run where reflection actually catches the wrong answer (rope-cutting fence-post error). Plus a playground panel where the calculator action runs for real on user expressions.
Tool Use Builder — schema → call → result protocol, made explicit.
Multi-Agent — when one agent isn’t enough; supervisor + workers + critic with failure-injection.

Build it in code

/ship/10 — build the agent loop — Agent class with three-axis budgets (iterations / seconds / tokens), typed StepLog, history pruning. ~150 lines including the three named failure modes (thrashing, premature giving-up, format drift).
/case-studies/02 — code-review agent — single agent in production. Propose-then-act tools, action-rate as the headline metric, real eval numbers from a 3-week run.
/case-studies/04 — customer-support bot — agent + RAG + escalation logic. The full composition.

Agent Loop & Architecture

The core loop

A minimal agent

The four ingredients

1. The model

2. The tool definitions

3. The handlers

4. The loop

Why this works

What can go wrong

The model loops

The model gives up too easily

The model hallucinates tool calls

Runaway cost

Stale context

Single-agent vs multi-agent

Streaming

Long-running agents

Frameworks

State of the art

Watch it interactively

Build it in code

See also