Planning & Reflection

The naive agent loop reacts step by step. For complex tasks, it can drift, miss obvious shortcuts, or fail without learning. Planning and reflection patterns add structure and self-correction.

ReAct (Reason + Act)

Yao et al. (2022). The original agentic-prompt pattern:

Thought: I need to find X. Let me search.
Action: search("X")
Observation: <result>
Thought: That's not quite right. Let me try a different query.
Action: search("Y")
Observation: <result>
Thought: Now I have enough to answer.
Final Answer: ...

Pre-native-tool-use, this was implemented by parsing Action: lines from text. Today, native tool calling does this automatically with cleaner mechanics, but the underlying pattern — reason, act, observe, repeat — is the same.

Plan-and-Execute

Before acting, generate a high-level plan:

Step 1: Plan
  Plan:
  1. Search for X
  2. Read top 3 results
  3. Compare findings
  4. Synthesize answer

Step 2: Execute each step
  Step 1: search("X") → results
  Step 2: read(results[0]), read(results[1]), read(results[2]) → contents
  Step 3: <reason about contents>
  Step 4: produce answer

Pros:

  • Forces upfront thinking; less drift.
  • Easier to display progress to users.
  • Sub-steps can be parallelized.

Cons:

  • Plans can be wrong; replanning is needed.
  • Adds an extra LLM call before action.

A common pattern in research agents and customer-support agents.

Plan-Replan loop

Fixed plans rarely survive contact with reality. A robust agent replans when:

  • An action fails unexpectedly.
  • The current plan no longer makes sense given new information.
  • A budget cap is approached.
plan = make_plan(task)
for step in plan:
    result = execute(step)
    if needs_replan(result, plan):
        plan = make_plan(task, history)

Reflection / Reflexion

Shinn et al. (2023). After failing a task, the agent generates a written reflection (“here’s what I tried, here’s what didn’t work, here’s what I’ll do differently”) and stores it in memory. Subsequent attempts use the reflection as context.

Attempt 1: <output> → fail (test fails)
Reflection: I assumed the API returns objects but it returns lists.
Attempt 2 (with reflection in context): <better output>

For tasks with feedback signals (test results, user thumbs-up/down), reflexion meaningfully boosts iterative agents.

Self-critique

Before submitting a final answer, the agent critiques its own draft:

Draft: <answer>
Critique: Does this address the user's question? Is anything missing? Any errors?
Final: <revised answer>

Often catches obvious mistakes. Adds 1.5–2× latency for the second pass.

Tree of Thoughts (ToT) for agents

Yao et al. (2023). Branch on different actions, explore each, score, pick best. For agents, this is action-level search:

Current state
├── Try action A → result A → score
├── Try action B → result B → score
└── Try action C → result C → score
   → pick best path; continue from there

Expensive (multiplies API calls) but sometimes the only way to handle adversarial or high-uncertainty problems.

Verification loops

Some tasks have a verifier — a way to check if the answer is correct (compiler, tests, schema validator). Use it inside the agent:

loop:
    answer = generate()
    if verify(answer):
        return answer
    else:
        feedback = verifier_message(answer)
        # add feedback to context, try again

The agent gets ground-truth feedback without a human in the loop. Used heavily by code agents (Claude Code, Cursor’s agent) and math agents.

Decomposition

For multi-part tasks, decompose into independent subtasks. Each subtask is a sub-agent or a function call:

Task: "Compare the security features of products A, B, C"

Subtasks:
  - Get security features of A
  - Get security features of B
  - Get security features of C
  - Compare and summarize

Each subtask runs (potentially in parallel); results aggregated.

Critical for keeping context windows manageable on big tasks.

Reasoning models change this

Modern reasoning models (Claude with extended thinking, o-series, R1) internalize much of this:

  • They plan before acting.
  • They reflect during reasoning.
  • They self-critique implicitly.

Asking a reasoning model to “make a plan, then execute” can be redundant — it does this naturally.

For non-reasoning models, explicit planning patterns still help.

When reasoning is overkill

For simple, narrow tool use (e.g. “look up the order status”), reasoning patterns are wasteful. Use them when:

  • Tasks have multiple plausible approaches.
  • Failure modes are subtle.
  • The agent needs to recover from errors gracefully.
  • The cost of a wrong answer is high.

Operational concerns

Loops

Reflection and planning patterns can trigger loops where the agent keeps “reflecting” without progress.

Mitigations:

  • Per-task max-iterations.
  • Detect repetitive plans/reflections.
  • Force progress: each loop iteration must do something (call a tool, output a deliverable).

Context cost

Plans, reflections, critiques all consume tokens. A heavily-reasoning agent on a 50-step task can use 100k+ tokens just for reasoning.

Mitigations:

  • Compact / summarize old plans and reflections.
  • Use a smaller cheaper model for sub-tasks (planning by Haiku, execution by Sonnet).
  • Cap reasoning budgets per step.

Visibility

For end-user products: plans and reflections are useful UX (show progress, build trust). But they can also overwhelm. Defaults that work:

  • Surface plans, hide reflections.
  • Show high-level steps, hide low-level reasoning.
  • Provide an “explain” button for users who want depth.

Combining patterns

Many production agents stack patterns:

1. Plan upfront
2. Each step: ReAct (reason + tool)
3. Verify with a tool when possible (tests, schema)
4. On failure: reflect, replan
5. Final answer: self-critique pass

This is roughly the architecture of Claude Code on complex tasks.

Pitfalls

  • Reflection without learning: the agent reflects but ignores its own conclusions.
  • Plans frozen too early: rigid plans don’t adapt; replan.
  • Endless self-critique: the agent can always find something to “improve.” Cap iterations.
  • Decomposition into too-small pieces: overhead exceeds benefit.
  • Pretending to plan: the agent generates a plan it doesn’t follow. Tie planning to actual execution.

See also